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Abstract 


We  consider  the  distributed  detection  problem  in  trees  with  unbounded  height.  The  first 
configuration  we  studied  in  this  report  is  a  balanced  binary  relay  tree,  where  the  leaves 
of  the  tree  correspond  to  N  identical  and  independent  sensors.  Only  the  leaves  are 
sensors.  The  root  of  the  tree  represents  a  fusion  center  that  makes  the  overall  detection 
decision.  Each  of  the  other  nodes  in  the  tree  are  relay  nodes  that  combine  two  binary 
messages  to  form  a  single  output  binary  message.  In  this  way,  the  information  from 
the  sensors  is  aggregated  into  the  fusion  center  via  the  relay  nodes.  In  Chapter  II, 
we  assume  that  the  fusion  rules  are  the  unit-threshold  likelihood-ratio  test  which  are 
locally  optimal  in  the  sense  of  minimizing  the  total  error  probability  after  fusion.  We 
describe  the  evolution  of  the  Type  I  and  Type  II  error  probabilities  of  the  binary  data 
as  it  propagates  from  the  leaves  towards  the  root.  Tight  upper  and  lower  bounds  for 
the  total  error  probability  at  the  fusion  center  as  functions  of  N  are  derived.  These 
characterize  how  fast  the  total  error  probability  converges  to  0  with  respect  to  N,  even 
if  the  individual  sensors  have  error  probabilities  that  converge  to  1/2. 

In  Chapter  III,  we  study  the  detection  performance  of  balanced  binary  relay  trees 
where  sensors  fail  with  certain  probability,  in  which  case  we  show  that  the  scaling  law 
for  the  decay  rate  of  the  total  error  probability  remains  \//V.  Moreover,  we  study  the 
case  where  the  communication  links  in  the  tree  network  fail  with  certain  probabili¬ 
ties.  Not  surprisingly,  the  step-wise  reduction  of  the  total  detection  error  probability  is 
slower  than  the  case  where  the  network  has  no  communication  link  failures.  We  show 
that,  under  the  assumption  of  identical  communication  link  failure  probability  in  the 
tree,  the  exponent  of  the  total  error  probability  at  the  fusion  center  is  o(y/~N)  in  the 
asymptotic  regime.  In  addition,  if  the  given  communication  link  failure  probabilities 
decrease  to  0  as  communications  get  closer  to  the  fusion  center,  then  the  decay  ex¬ 
ponent  of  the  total  error  probability  is  0(\/]V),  provided  that  the  decay  of  the  failure 
probabilities  is  sufficiently  fast. 

In  Chapter  IV,  we  call  the  set  of  all  fusion  rules  in  the  tree  a  fusion  strategy.  We 
study  the  fusion  strategy  that  maximizes  the  reduction  in  the  total  error  probability 
between  the  sensors  and  the  fusion  center.  We  formulate  this  optimization  problem  as 
a  deterministic  dynamic  program.  For  trees  with  finite  height,  we  provide  the  explicit 
optimal  strategy.  Moreover,  we  show  that  the  reduction  in  the  total  error  probability  is  a 
submodular  function.  Hence  the  greedy  strategy  which  only  maximizes  the  level-wise 
reduction  in  the  total  error  probability,  is  close-to  the  globally  optimal  strategy  in  terms 
of  the  reduction  in  the  probability  of  error. 

In  Chapter  V,  we  consider  a  more  general  M- ary  relay  tree  configuration,  where 
each  non-leaf  node  in  the  tree  has  M  child  nodes  and  only  binary  messages  are  al¬ 
lowed  to  communicate  throughout  the  tree.  Similarly  we  derive  tight  upper  and  lower 
bounds  for  the  Type  I  and  II  error  probabilities  at  the  fusion  center  as  explicit  functions 
of  the  number  of  sensors.  These  bounds  characterize  how  fast  the  error  probabilities 
converge  to  0  with  respect  to  the  number  of  sensors.  Building  on  the  work  on  the  de¬ 
tection  performance  of  M- ary  relay  trees  with  binary  messages,  we  further  study  the 
case  of  non-binary  relay  message  alphabets.  We  characterize  the  exponent  of  the  error 
probability  with  respect  to  the  message  alphabet  size  V,  showing  how  the  detection 


performance  increases  with  V.  Our  method  involves  reducing  a  tree  with  non-binary 
relay  messages  into  an  equivalent  higher-degree  tree  with  only  binary  messages. 

Last,  the  connections  between  information  geometry  and  performance  of  sensor 
networks  for  target  tracking  are  explored  to  pursue  a  better  understanding  of  place¬ 
ment,  planning  and  scheduling  issues.  Firstly,  the  integrated  Fisher  information  dis¬ 
tance  (IFID)  between  the  states  of  two  targets  is  analyzed  by  solving  the  geodesic 
equations  and  is  adopted  as  a  measure  of  target  resolvability  by  the  sensor.  The  dif¬ 
ferences  between  the  IFID  and  the  well  known  Kullback-Leibler  divergence  (KLD) 
are  highlighted.  We  also  explain  how  the  energy  functional,  which  is  the  “integrated, 
differential”  KLD,  relates  to  the  other  distance  measures.  Secondly,  the  structures  of 
statistical  manifolds  are  elucidated  by  computing  the  canonical  Levi-Civita  affine  con¬ 
nection  as  well  as  Riemannian  and  scalar  curvatures.  We  show  the  relationship  between 
the  Ricci  curvature  tensor  field  and  the  amount  of  information  that  can  be  obtained  by 
the  network  sensors.  Finally,  an  analytical  presentation  of  statistical  manifolds  as  an 
immersion  in  the  Euclidean  space  for  distributions  of  exponential  type  is  given.  The 
significance  and  potential  to  address  system  definition  and  planning  issues  using  infor¬ 
mation  geometry,  such  as  the  sensing  capability  to  distinguish  closely  spaced  targets, 
calculation  of  the  amount  of  information  collected  by  sensors  and  the  problem  of  opti¬ 
mal  scheduling  of  network  sensor  and  resources,  etc.,  are  demonstrated.  The  proposed 
analysis  techniques  are  presented  via  three  basic  sensor  network  scenarios:  a  simple 
range -bearing  radar,  two  bearings-only  passive  sonars,  and  three  ranges-only  detectors, 
respectively. 
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Chapter  1 


Introduction 


The  distributed  detection  has  been  studied  intensively  in  the  last  30  years.  This  problem 
is  similar  to  a  classical  detection  or  hypothesis  testing  problem,  except  that  each  sensor 
makes  an  individual  measurement  but  it  is  only  allowed  to  communicated  a  compressed 
version  of  its  raw  measurement.  The  objective  is  to  choose  the  most  ‘informative’ 
summarized  messages  such  that  a  global  objective  function  is  optimized.  For  example, 
the  probability  of  error  for  the  detection  decision  is  minimized.  This  problem  is  also 
know  as  decentralized  detection  or  data  fusion  in  many  literatures.  This  subject  is 
also  of  interest  to  the  social  learning  perspective,  which  focus  on  how  fast  one  peer 
can  learn  from  other  peers  in  a  social  network.  We  begin  our  discussion  with  some 
background  and  suvey  of  this  intriguring  problem. 


1.1  Background  and  Related  Work 


Consider  a  hypothesis  testing  problem  under  two  scenarios:  centralized  and  decentral¬ 
ized.  Under  the  centralized  network  scenario,  all  sensors  send  their  raw  measurements 
to  the  fusion  center,  which  makes  a  decision  based  on  these  measurements.  In  the  de¬ 
centralized  network  introduced  in  [1],  sensors  send  summaries  of  their  measurements 
and  observations  to  the  fusion  center.  The  fusion  center  then  makes  a  decision.  In  a 
decentralized  network,  information  is  summarized  into  smaller  messages.  Evidently, 
the  decentralized  network  cannot  perform  better  than  the  centralized  network.  It  gains 
because  of  its  limited  use  of  resources  and  bandwidth;  through  transmission  of  sum¬ 
marized  information  it  is  more  practical  and  efficient. 

The  decentralized  network  in  [1]  involves  the  parallel  architecture,  also  known  as 
the  star  architecture  [1] — [17],  [40],  in  which  all  sensors  directly  connect  to  the  fusion 
center.  Most  literatures  focus  on  the  issue  about  how  to  quantize  the  measurements 
so  that  the  probability  of  error  after  fusion  is  minimized.  Another  perspective  is  to 
study  how  fast  the  error  probability  decays  with  respect  to  the  number  of  sensors  in 
a  large  scale  network.  A  typical  result  is  that  under  the  assumption  of  (conditionally) 
independence  of  the  sensor  observations,  the  decay  rate  of  the  error  probability  in  the 
parallel  architecture  is  exponential  [6], 
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Several  different  sensor  topologies  have  been  studied  under  the  assumption  of  con¬ 
ditional  independence.  The  first  configuration  for  such  a  fusion  network  considered 
was  the  tandem  network  [  18]— [22],  [40],  in  which  each  non-leaf  node  combines  the 
information  from  its  own  sensor  with  the  message  it  has  received  from  the  node  at  one 
level  down,  which  is  then  transmitted  to  the  node  at  the  next  level  up.  The  decay  rate 
of  the  error  probability  in  this  case  is  sub-exponential  [22],  Specifically,  as  the  number 
of  sensors  N  goes  to  infinity,  the  exponent  of  the  error  probability  is  dominated  by  Nd 
asymptotically  for  all  d  >  1/2  [20].  This  sensor  network  represents  a  situation  where 
the  length  of  the  network  is  the  longest  possible  among  all  networks  with  N  leaf  nodes. 

The  asymptotic  performance  of  single-rooted  tree  network  with  bounded  height  is 
discussed  in  [23]— [3 1  ],  [40].  Even  though  the  error  probability  in  the  parallel  configu¬ 
ration  decreases  exponentially,  in  a  practical  implementation,  the  resources  consumed 
in  having  each  sensor  transmit  directly  to  the  fusion  center  might  be  regarded  as  exces¬ 
sive.  Energy  consumption  can  be  reduced  by  setting  up  a  directed  tree,  rooted  at  the 
fusion  center.  In  this  tree  structure,  measurements  are  summarized  by  leaf  sensor  nodes 
and  sent  to  their  parent  nodes,  each  of  which  fuses  all  the  messages  it  receives  with  its 
own  measurement  (if  any)  and  then  forwards  the  new  message  to  its  parent  node  at 
the  next  level.  This  process  takes  place  throughout  the  tree  culminating  in  the  fusion 
center,  where  a  final  decision  is  made.  For  bounded-height  tree  configuration  under 
the  Neyman-Pearson  criterion,  the  optimal  error  exponent  is  as  good  as  that  of  the  par¬ 
allel  configuration  under  certain  conditions.  For  example,  for  a  bounded-height  tree 
network  with  £n/t~n  =  1-  where  tjv  denotes  the  total  number  of  nodes  and 

Tv  denotes  the  number  of  leaf  nodes,  the  optimal  error  exponent  is  the  same  as  that  of 
the  parallel  configuration  [24],  [26].  For  bounded-height  tree  configuration  under  the 
Bayesian  criterion,  the  error  probability  decays  exponentially  fast  to  0  with  an  error 
exponent  which  is  worse  than  the  one  associated  with  the  parallel  configuration  [27]. 

The  variation  of  detection  performance  with  increasing  tree  height  is  still  largely 
unexplored.  If  only  the  leaf  nodes  have  sensors  making  observations,  and  all  other 
nodes  simply  fuse  the  messages  received  and  forward  the  new  messages  to  their  par¬ 
ents,  the  tree  network  is  known  as  a  relay  tree.  The  balanced  binary  relay  tree  has 
been  addressed  in  [32],  in  which  it  is  assumed  that  the  leaf  nodes  are  independent  sen¬ 
sors  with  identical  Type  I  error  probability  (also  known  as  probability  of  false  alarm, 
denoted  by  op)  and  identical  Type  II  error  probability  (also  known  as  probability  of 
missed  detection,  denoted  by  f30).  It  is  shown  there  that  if  the  sensor  error  probabilities 
satisfy  the  condition  ap  +  To  <  1,  then  both  the  Type  I  and  Type  II  error  probabilities 
at  the  fusion  center  converge  to  0  as  the  N  goes  to  infinity.  If  ap  +  Po  >  1,  then  both 
the  Type  I  and  Type  II  error  probabilities  converge  to  1,  which  means  that  if  we  flip  the 
decision  at  the  fusion  center,  then  the  Type  I  and  Type  II  error  probabilities  converge 
to  0.  Because  of  this  symmetry,  it  suffices  to  consider  the  case  where  op  +  /?o  <  1-  If 
a o  +  /?o  =  1-  then  the  Type  I  and  II  error  probabilities  add  up  to  1  at  each  node  of  the 
tree.  In  consequence,  this  case  is  not  of  interest. 

In  [36],  this  problem  was  considered  in  an  M- ary  relay  tree  configuration,  where 
each  node  with  the  exception  of  the  sensors  has  M  child  nodes.  Notice  that  balanced 
binary  relay  trees  are  simply  special  cases  of  M- ary  relay  trees.  To  describe  the  result 
in  [36],  let  Psj  be  the  total  error  probability  at  the  fusion  center  and  suppose  that  each 
sensor  and  relay  node  only  transmit  binary  messages  upward  to  a  node  at  the  next  level. 
Then,  it  is  shown  in  [36]  that  with  any  combination  of  fusion  rules,  the  decay  exponent 
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is  upper  bounded: 


log  2P^=0(N^^). 

The  case  where  the  relay  nodes  and  the  fusion  center  use  the  majority  dominance  rule 
(with  random  tie-breaking)  to  combine  messages  was  also  considered  in  [36],  in  which 
case  the  decay  rate  of  the  total  error  probability  is  almost  optimal.  More  precisely, 

log.jP-1  =  f2(jVlogML  2  J). 

Therefore,  in  the  case  where  M  is  odd,  the  majority  dominance  rule  achieves  the  op¬ 
timal  decay  exponent.  In  the  case  where  M  is  even,  there  is  a  gap  between  the  two 
bounds.  This  gap  is  evident  in  the  case  of  balanced  binary  relay  tree  (M  =  2),  where 
it  is  easy  to  show  that  the  Type  I  and  II  error  probabilities  do  not  change  after  fusion 
with  the  majority  dominance  rule.  This  shows  that  the  lower  bound  above  is  tight. 

In  this  report,  we  develop  several  new  results  concerning  the  detection  performance 
for  trees  with  unbounded  height.  We  show  that  in  trees  with  unbounded-height,  the  de¬ 
cay  rate  of  the  total  error  probability  is  sub-exponentially  fast  and  we  show  the  explicit 
asymptotic  decay  exponent. 


3 


Chapter  2 


Balanced  Binary  Relay  Trees 


In  this  chapter,  we  consider  the  balanced  binary  relay  tree  configuration  and  describe 
the  precise  evolution  of  the  Type  I  and  Type  II  error  probabilities  in  this  case.  In 
addition,  we  provide  upper  and  lower  bounds  for  the  total  error  probability  at  the  fusion 
center  as  functions  of  N.  These  characterize  the  decay  rate  of  the  total  error  probability. 
We  also  show  that  the  total  error  probability  converges  to  0  under  certain  condition  even 
if  the  sensors  are  asymptotically  crummy,  that  is,  «o  +  Po  — ►  1- 


2.1  Problem  Formulation 


We  consider  the  problem  of  binary  hypothesis  testing  between  //(J  and  H  \  in  a  balanced 
binary  relay  tree.  Leaf  nodes  are  sensors  undertaking  initial  and  independent  detections 
of  the  same  event  in  a  scene.  These  measurements  are  summarized  into  binary  mes¬ 
sages  and  forwarded  to  nodes  at  the  next  level.  Each  non-leaf  node  with  the  exception 
of  the  root,  the  fusion  center,  is  a  relay  node,  which  fuses  two  binary  messages  into 
one  new  binary  message  and  forwards  the  new  binary  message  to  its  parent  node.  This 
process  takes  place  at  each  node  culminating  in  the  fusion  center,  at  which  the  final 
decision  is  made  based  on  the  information  received.  Only  the  leaves  are  sensors  in  this 
tree  architecture. 

In  this  configuration,  as  shown  in  Fig.  2.1,  the  closest  sensor  to  the  fusion  center  is 
as  far  as  it  could  be,  in  terms  of  the  number  of  arcs  in  the  path  to  the  root.  In  this  sense, 
this  configuration  is  the  worst  case  among  all  relay  trees  with  N  sensors.  Moreover,  in 
contrast  to  the  configuration  in  [24]  and  [26]  discussed  earlier,  in  our  balanced  binary 
tree  we  have  lim^^oo  £n/tn  =  1/2  (as  opposed  to  1  in  [24]  and  [26]).  Hence,  the 
number  of  times  that  information  is  aggregated  is  essentially  as  large  as  the  number 
of  measurements  (cf.,  [24]  and  [26],  in  which  the  number  of  measurements  dominates 
the  number  of  fusions).  In  addition,  the  height  of  the  tree  is  log  N  (log  stands  for 
binary  logrithm  if  not  specified  in  this  report),  which  grows  as  the  number  of  sensors 
increases. 

We  assume  that  all  sensors  are  independent  given  each  hypothesis,  and  that  all  sen- 
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(« k,Pk) 

(«w,A-i) 


2  >  Pi  ) 

(a i.  Pi) 

(«o>  Ao ) 

N  =  2k 

Figure  2.1:  A  balanced  binary  relay  tree  with  height  k.  Circles  represent  sensors  mak¬ 
ing  measurements.  Diamonds  represent  relay  nodes  which  fuse  binary  messages.  The 
rectangle  at  the  root  represents  the  fusion  center  making  an  overall  decision. 

sors  have  identical  Type  I  error  probability  «q  and  identical  Type  II  error  probability 
Po-  We  apply  the  likelihood-ratio  test  [41]  with  threshold  1  as  the  fusion  rule  at  the 
relay  nodes  and  at  the  fusion  center.  This  fusion  rule  is  locally  (but  not  necessarily 
globally)  optimal  in  the  case  of  equally  likely  hypotheses  H0  and  II\ ;  i.e.,  it  minimizes 
the  total  error  probability  locally  at  each  fusion  node.  In  the  case  where  the  hypotheses 
are  not  equally  likely,  the  locally  optimal  fusion  rule  has  a  different  threshold  value, 
which  is  the  ratio  of  the  two  hypothesis  probabilities.  However,  this  complicates  the 
analysis  without  bringing  any  additional  insights.  Therefore,  for  simplicity,  we  hence¬ 
forth  assume  a  threshold  value  of  1  in  our  analysis.  We  are  interested  in  following 
questions: 

•  What  are  these  Type  I  and  Type  II  error  probabilities  as  functions  of  TV? 

•  Will  they  converge  to  0  at  the  fusion  center? 

•  If  yes,  how  fast  will  they  converge  with  respect  to  N ? 

Fusion  at  a  single  node  receiving  information  from  the  two  immediate  child  nodes 
where  these  have  identical  Type  I  error  probabilities  a  and  identical  Type  II  error  prob¬ 
abilities  P  provides  a  detection  with  Type  I  and  Type  II  error  probabilities  denoted  by 
( a p'),  and  given  by  [32]: 

f  (1-  (1  -«)2,/32),  a<P, 

(a',p')  =  f(a,p):={  (2.1) 

{  (o:2, 1  —  (1  —  P)2),  a  >  P- 

Evidently,  as  all  sensors  have  the  same  error  probability  pair  (ao,Po),  all  relay 
nodes  at  level  1  will  have  the  same  error  probability  pair  (cti,  Pi)  =  f(a o,  Po),  and  by 
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a 

Figure  2.2:  A  trajectory  of  the  sequence  {( oik ,  /3fc)}  in  the  (a,  (3)  plane. 


recursion, 

{ak+1,Pk+i  )  =  f{ak,fik),  k  =  0,1,...,  (2.2) 

where  ( ctk ,  /3k)  is  the  error  probability  pair  of  nodes  at  the  k- th  level  of  the  tree. 

The  recursive  relation  (2.2)  allows  us  to  consider  the  pair  of  the  Type  I  and  II  error 
probabilities  as  a  discrete  dynamic  system.  In  [32],  which  focuses  on  the  convergence 
issues  for  the  total  error  probability,  convergence  was  proved  using  Lyapunov  methods. 
The  analysis  of  the  precise  evolution  of  the  sequence  {(ak,/3k)}  and  the  total  error 
probability  decay  rate  remains  open.  In  this  report,  we  will  establish  upper  and  lower 
bounds  for  the  total  error  probability  and  deduce  the  precise  decay  rate  of  the  total  error 
probability. 

To  illustrate  the  ideas,  consider  first  a  single  trajectory  for  the  dynamic  system  given 
by  (2.1),  and  starting  at  the  initial  state  (ao,  /3q).  This  trajectory  is  shown  in  Fig.  2.2. 
It  exhibits  different  behaviors  depending  on  its  distance  from  the  f3  =  a  line.  The 
trajectory  approaches  (3  =  a  very  fast  initially,  but  when  (ak,/3k)  approaches  within  a 
certain  neighborhood  of  the  line  f3  =  a,  the  next  pair  (ak+i,  /3k+i)  will  appear  on  the 
other  side  of  that  line.  In  the  next  section,  we  will  establish  theorems  that  characterize 
the  precise  step-by-step  behavior  of  the  dynamic  system  (2.2).  In  Section  2.4,  we  derive 
upper  and  lower  bounds  for  (twice)  the  total  error  probability  P^r  at  the  fusion  center  as 
functions  of  N.  These  bounds  show  that  the  convergence  of  the  total  error  probability 
is  sub-exponential.  Specifically,  the  exponent  of  Py  is  essentially  \//V  (cf.,  [24],  [26], 
and  [27],  where  the  convergence  of  the  total  error  probability  is  exponential  in  trees 
with  bounded  height;  more  precisely,  under  the  Neyman-Pearson  criterion,  the  optimal 
error  exponent  is  the  same  as  that  of  the  parallel  configuration  if  leaf  nodes  dominate; 
i.e.,  lim^^oo  In/t~n  =  1;  but  under  the  Bayesian  criterion  it  is  worse). 
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Figure  2.3:  (a)  Regions  B i,  Ii2,  and  lie  in  the  (a,  /3)  plane,  (b)  The  trajectory  in  Fig. 
2.2  superimposed  on  (a),  where  solid  lines  represent  boundaries  of  Bm  and  dashed 
lines  represent  boundaries  of  R. 

2.2  Evolution  of  the  Type  I  and  II  error  probabilities 


The  relation  (2. 1)  is  symmetric  about  both  of  the  lines  a+/3  =  1  and  (3  =  a.  Therefore, 
it  suffices  to  study  the  evolution  of  the  dynamic  system  {(cufc,  /3k)}  only  in  the  region 
bounded  by  a  +  f3  <  1  and  f3  >  a.  We  denote 

U  :=  {(a,/3)  >  0|a  +  (3  <  1  and  (3  >  a} 

to  be  this  triangular  region.  Similarly,  define  the  complementary  triangular  region 

C  :=  {(ct,  f3 )  >  0|a  +  /3  <  1  and  /3  <  a}. 

We  denote  the  following  region  by  B\\ 

£i  :=  {(a, /3)  £U\(1  —  a)2  +  (32  <  1}. 

If  (ak,/3k)  £  B\,  then  the  next  pair  (otk+i,  (3k+i)  =  /(ctfe,  /3k)  crosses  the  line  f3  =  a 
to  the  opposite  side  from  ( ak,f3k )■  More  precisely,  if  ( otk,/3k )  £  13,  then  ( ctk,f3k )  £ 
B\  if  and  only  if  {otk+i, /3k+i)  =  f{&k,/3k)  £  £■  In  other  words,  Bi  is  the  inverse 
image  of  C  under  mapping  /  in  U.  The  set  B\  is  shown  in  Fig.  2.3(a).  Fig.  2.3(b) 
illustrates  this  behavior  of  the  trajectory  for  the  example  in  Fig.  2.2.  For  instance,  as 
shown  in  Fig.  2.3(b),  if  the  state  is  at  point  1  in  B\ ,  then  it  jumps  to  the  next  state  point 
2,  on  the  other  side  of  /3  =  a. 

Denote  the  following  region  by  B2 : 

B2  :=  {(a, (3)  £  U\(l  -  a)2  +  /32  >  1  and  (1  -  a)4  +  /34  <  1}. 

It  is  easy  to  show  that  if  ( ak,/3k )  £  W,then  (ak,Pk)  £  B2  if  and  only  if  {ak+i,  /3k+i)  = 
f(ctk,  /3k)  £  B\.  In  other  words,  B>  is  the  inverse  image  of  B\  in  U  under  mapping  /. 
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Figure  2.4:  Upper  boundaries  of  B\ .  B^,  and  By . 


The  behavior  of  /  is  illustrated  in  the  movement  from  point  0  to  point  1  in  Fig.  2.3(b). 
The  set  B2  is  identified  in  Fig.  2.3(a),  lying  directly  above  B\ . 

Now  for  an  integer  m  >  1,  recursively  define  Bm  to  be  the  inverse  image  of  Bm-  1 
under  mapping  /,  denoted  by  Bm.  It  is  easy  to  see  that 

Bm  :=  {(a,  (3)  £  U |(l-a)2<m  ^  +  /32<m  11  >  1  and  (1  -  a)2™  +  Z?2™  <  1}. 

Notice  that  U  =  U™=i  Hence,  for  any  (a 0,  /?o)  £  U,  there  exists  m  such  that 
(ao,  Po)  €  Bm.  This  gives  a  complete  description  of  how  the  dynamics  of  the  system 
behaves  in  the  upper  triangular  region  U.  For  instance,  if  the  initial  pair  (ao,  To)  lies 
in  Bm,  then  the  system  evolves  in  the  order 

Bm  Bm- 1  B2  —>  Bi. 

Therefore,  the  system  will  enter  B 1  after  m  —  1  levels  of  fusion;  i.e.,  (am_ i,Pm-i)  G 
By 


As  the  next  stage,  we  consider  the  behavior  of  the  system  after  it  enters  B\ .  The 
image  of  B\  under  mapping  /,  denoted  by  lie,  is  (see  Fig.  2.3(a)) 

Rc  '■=  {(a, P)  €  C\\/\  —  a  +  \/~P  >  1}. 

We  can  define  the  reflection  of  Bm  about  the  line  T  =  a  in  the  similar  way  for  all 
to.  Similarly,  we  denote  by  By  the  reflection  of  Re  about  the  line  p  =  a;  i.e., 

Ru  '■=  {(a, p)  €  —  P  +  \fa.  >  1}. 

We  denote  the  region  liy  U  lie  by  R-  We  will  show  that  II  is  an  invariant  region  in  the 
sense  that  once  the  dynamic  system  enters  R,  it  stays  there.  For  example,  as  shown  in 
Fig.  2.3(b),  the  system  after  point  1  stays  inside  R. 
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Proposition  2.2.1.  If(ako,  Pk0)  £  R  far  some  ko,  then  (ak,  /3k)  £  R  far  all  k  >  kg. 

Proof.  First  we  show  that  B\  C  Ru  C  B\  U  B2. 

Notice  that  B\,  Ru ,  and  Bi  U  B2  share  the  same  lower  boundary  (3  =  a.  It  suffices 
to  show  that  the  upper  boundary  of  Ru  lies  between  the  upper  boundary  of  B  >  and  that 
of  B\  (see  Fig.  2.4). 

First,  we  show  that  the  upper  boundary  of  Ru  lies  above  the  upper  boundary  of  B\. 

We  have 

1  -  (1  -  v^)2  >  Vl-(l^a)2 

4=>  2y/a  —  a  >  \j2a  —  a2 
a2  +  a  —  2a3/2  >  0, 

which  holds  for  all  a  in  [0, 1).  Thus,  B\  C  Ru- 

Now  we  prove  that  the  upper  boundary  of  Ru  lies  below  that  of  B2.  We  have 

(1  —  (1  -  a)4)1/4  >  1  —  (1  —  \fa)2 
1  -  (1  -  a)4  >  (2y/a  -  a)4 
<=>  —  2(v/a  —  l)2a(— a3^2  +  a(y/a  —  1) 

T-  ol  —  1)  +  Oi  —  2)  >  0, 

which  holds  for  all  a  in  [0, 1)  as  well.  Hence,  Ru  C  B\  U  B2. 

Without  loss  of  generality,  we  assume  that  (ak0 ,  Pk0 )  £  Ru  ■  It  means  that  ( a.k0 ,  /3fc0 )  £ 
Bi  or  (otk0,(3k0)  £  -B2nf?w.  If  {ako,j3ko)  £  B±.  then  the  next  pair  (ako+i,  Pk0+i)  lies 
in  Rc ■  If  (afeo >  Pk0 )  £  B2ni?M,then(afeo+i,J8feo+i)  £  Bx  c  Ru  and  {ako+2,  f3ko+2)  £ 

Re-  By  symmetry  considerations,  it  follows  that  the  system  stays  inside  R  for  all 

k  >  ko. 

□ 

So  far  we  have  studied  the  precise  evolution  of  the  sequence  {{akfak)}  in  the 
(a,  (3)  plane.  In  the  next  section,  we  will  consider  the  step-wise  reduction  in  the  total 
error  probability  and  deduce  upper  and  lower  bounds  for  it. 


2.3  Error  Probability  Bounds 


In  this  section,  we  will  first  derive  bounds  for  the  total  error  probability  in  the  case  of 
equally  likely  hypotheses,  where  the  fusion  rule  is  the  likelihood-ratio  test  with  unit 
threshold.  Then  we  will  deduce  bounds  for  the  total  error  probability  in  the  case  where 
the  prior  probabilities  are  unequal  but  the  fusion  rule  remains  the  same. 

The  total  error  probability  for  a  node  with  ( ak ,  (3k)  is  (ak  +  (3k)/ 2  in  the  case  of 
equal  prior  probabilities.  Let  Lk  =  ak  +  (3k,  namely,  twice  the  total  error  probability. 
Analysis  of  the  total  error  probability  results  from  consideration  of  the  sequence  {Lk}. 
In  fact,  we  will  derive  bounds  on  log  1 ,  whose  growth  rate  is  related  to  the  rate  of 
convergence  of  Lk  to  0.  We  divide  our  analysis  into  two  parts: 
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I  We  will  study  the  shrinkage  of  the  total  error  probability  as  the  system  propagates 
from  Bm  to  I3  \ ; 

II  We  will  study  the  shrinkage  of  the  total  error  probability  after  the  system  enters 

/'V 

2.3.1  Case  I:  analysis  as  the  system  propagates  from  Dm  to  B1 

Suppose  that  the  initial  state  (cto,/3o)  lies  in  Bm,  where  m  is  a  positive  integer  and 
to  7^  1.  From  the  previous  analysis,  (am_i,|3m_i)  £  B\.  In  this  section,  we  study 
the  rate  of  reduction  of  the  total  error  probability  as  the  system  propagates  from  Bm  to 

Bi. 

Proposition  2.3.1.  Suppose  that  ( oikiPk )  £  Bm,  where  m  is  a  positive  integer  and 
m  1.  Then, 

1  <  <  2. 

Ll 

The  proof  is  given  in  Appendix  A.  Fig.  2.5  shows  a  plot  of  values  of  Lk+i/L^  in 
Um=2  Bm-  With  the  recursive  relation  given  in  Proposition  2.3.1,  we  can  derive  the 
following  bounds  for  log  Lf  1 . 


Figure  2.5:  Ratio  Lk+i/Lk  in  Um=2  Bm-  Each  line  depicts  the  ratio  versus  a*,  for  a 
fixed  j3k- 

Proposition  2.3.2.  Suppose  that  (cto,/3o)  £  Bm,  where  m  is  a  positive  integer  and 
to  1.  Then,  for  k  =  1, 2, . . . ,  to  —  1, 

2fc  (logLo1-l)<log^1<2fclogLo1. 

The  proof  is  given  in  Appendix  B.  Suppose  that  the  balanced  binary  relay  tree  has 
N  leaf  nodes.  Then,  the  height  of  the  fusion  center  is  log  N.  For  convenience,  let 
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PN  =  L\ogN  be  (twice)  the  total  error  probability  at  the  fusion  center.  Substituting 
k  =  log  N  into  Proposition  2.3.2,  we  get  the  following  result. 

Corollary  2.3.1.  Suppose  that  (a o,/?o)  €  Bm,  where  m  is  a  positive  integer  and 
m  1.  If  log  N  <  m,  then 


N  (log  L0  1  -  l)  <  logP^1  <  N  log  L0  1. 

Notice  that  the  lower  bound  of  log  P/1  is  useful  only  if  Lq  <  1/2.  Next  we  derive 
a  lower  bound  for  log  -P/v1  which  is  useful  for  all  Lq  £  (0, 1). 

Proposition  2.3.3.  Suppose  that  (a a-,  fk)  £  Bm,  where  m  is  a  positive  integer  and 
to  1.  Then, 

t  y/2 
^ k 

The  proof  is  given  in  Appendix  C.  Fig.  2.6  shows  a  plot  of  values  of  Lk+i/L ^  in 
U“=2  Bm.  With  the  inequality  given  in  Proposition  2.3.3,  we  can  derive  a  new  lower 
bound  for  log  Pf  ,  which  is  useful  for  all  Lq  £  (0, 1). 


Figure  2.6:  Ratio  Lk+2/L] f*  in  Um=2  Each  line  depicts  the  ratio  versus  ak  for  a 
fixed  [3k. 

Proposition  2.3.4.  Suppose  that  (ao,/3o)  £  Bm,  where  m  is  a  positive  integer  and 
to  7^  1.  If  log  N  <  to,  then 

log  P^1  >  VwlogLg1. 

The  proof  is  given  in  Appendix  D. 
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2.3.2  Case  II:  analysis  when  the  system  stays  inside  R 

We  have  derived  error  probability  bounds  up  until  the  point  where  the  trajectory  of 
the  system  enters  Bi .  In  this  section,  we  consider  the  total  error  probability  reduction 
from  that  point  on.  First  we  will  establish  error  probability  bounds  for  even-height 
trees.  Then  we  will  deduce  error  probability  bounds  for  odd-height  trees. 


Error  probability  bounds  for  even-height  trees 


If  (otoi  Po)  £  for  some  m/1,  then  (am_i, £  Bi.  The  system  afterward 
stays  inside  the  invariant  region  R  (but  not  necessarily  inside  B\ ).  Hence,  the  decay 
rate  of  the  total  error  probability  in  the  invariant  region  R  determines  the  asymptotic 
decay  rate.  Without  loss  of  generality,  we  assume  that  (ao,/?o)  lies  in  the  invariant 
region  R.  In  contrast  to  Proposition  2.3.1,  which  bounds  the  ratio  Lk+i/L\,  we  will 
bound  the  ratio  associated  with  taking  two  steps. 

Proposition  2.3.5.  Suppose  that  (ak,Pk)  £  R-  Then, 


1  < 


Lk+ 2 


<  2. 


The  proof  is  given  in  Appendix  E.  Fig.  2.7(a)  and  Fig.  2.7(b)  show  plots  of  values 
of  Lk+2/L^.  in  Bi  and  B2  IT  Ru ,  respectively. 


Figure  2.7 :  (a)  Ratio  Lk+2/L^  in  Bi.  (b)  Ratio  Lk+2/Lf,  in  B2  IT  Ru-  Each  line 
depicts  the  ratio  versus  for  a  fixed  3k- 

Proposition  2.3.5  gives  bounds  on  the  relationship  between  Lk  and  L^+2  in  the 
invariant  region  R.  Hence,  in  the  special  case  of  trees  with  even  height,  that  is,  when 
log  N  is  an  even  integer,  it  is  easy  to  bound  P\;  in  terms  of  Lq.  In  fact,  we  will  bound 
log  Hy  1  which  in  turn  provides  bounds  for  P/y. 


0.4 
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Theorem  2.3.1.  Suppose  that  (ao,  /?o)  G  R  and  log  N  is  even.  Then, 

VN  (log  Lq  1  -  l)  <  logP^1  <  y/NlogLg1. 

Proof.  If  (a0,  /3o)  €  R,  then  we  have  (otk,Pk)  €  R  for  k  =  0, 1, . . . ,  log  N  —  2.  From 
Proposition  2.3.5,  we  have 

Lk+ 2  =  afcPfc 

for  k  =  0,2,...,  log  TV  —  2  and  some  a*,  £  [1,2].  Therefore,  for  k  =  2, 4, ... ,  log  TV, 
we  have 

j-  _  2  2(fc-2)/2  2fc/2 

—  Ct(k-2)/2  '  a(k-4)/2  '  •  '  a0  ^0  ) 

where  a*  S  [1,  2]  for  each  j.  Substituting  k  =  log  AT,  we  have 

_  2  2(fe-2)/2  ologv/JV 

P/V  —  0(fc-2)/2  ’  «(fc_ 4)/2  ■  '  ■  a0  pQ 


Hence, 


_  2  viv/2ryjv 

—  a{k- 2)/2  •  a(fc-4)/2  •  '  ■  a0  ^0  • 


logP^1  =  -  loga(fe_2)/2  -  21oga(fc_4)/2  -  . . . 
-  -^logao  +  V^logPo1. 


Notice  that  log  L0  1  >  0  and  0  <  log  a,  <  1  for  each  i.  Thus, 


Finally, 


log  PNl  <  s/N  log  L0  1 . 


logP^1  >  -1  -  2  -  . . .  -  ^  +  v/AHogP'1 

>  -VN  +  V^/VlOgPg  1  =  VN  (lOgPg  1  -  l)  . 


□ 


Notice  the  lower  bound  for  log  PN  in  Theorem  2.3.1  is  useful  only  if  L0  <  1/2. 
We  further  provide  a  lower  bound  for  log  Pf1  which  is  useful  for  all  Lq  £  (0, 1). 

Proposition  2.3.6.  Suppose  that  (a/.,  B^)  £  R.  Then, 

Lk+ 2 


'V2 


<  1. 


The  proof  is  given  in  Appendix  F.  Fig.  2.8(a)  and  Fig.  2.8(b)  show  plots  of  the  ratio 


3-1 


inside  B\  and  B 2  H  Ru,  respectively.  Next  we  derive  a  new  lower  bound  for  log  PN  . 
Proposition  2.3.7.  Suppose  that  (ao>  /?o)  G  R  and  log  TV  is  even.  Then, 

log  P^1  >  ^VlogPg1. 


The  proof  is  given  in  Appendix  G. 
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Figure  2.8:  (a)  Ratio  Lk+2/L ^  in  B\.  (b)  Ratio  Lk+2/ in  B2  FI  Ru ■  Each  line 
depicts  the  ratio  versus  a j-  for  a  fixed 


Error  probability  bounds  for  odd-height  trees 


Next  we  explore  the  case  of  trees  with  odd  height;  i.e.,  log  N  is  an  odd  integer.  Assume 
that  (ofj.  /3o)  lies  in  the  invariant  region  It.  First,  we  will  establish  general  bounds  for 
odd-height  trees.  Then  we  deduce  bounds  for  the  case  where  there  exists  ( am ,  /3m)  £ 
B2  IT  Ru  for  some  m  £  {0, 1, ... ,  log  N  —  2}. 


For  odd-height  trees,  we  need  to  know  how  much  the  total  error  probability  is 
reduced  by  moving  up  one  level  in  the  tree. 

Proposition  2.3.8.  Suppose  that  ( ctk,0k )  G  U.  Then, 

Lk+i 


1  < 


LI 


and 


Lk+i 

Lk 


<  1. 


The  proof  is  given  in  Appendix  H.  Fig.  2.9(a)  and  Fig.  2.9(b)  show  plots  of  values 
of  Lk+i/Ll  and  Lk+i/Lk  in  U. 

Using  Propositions  2.3.5  and  2.3.8,  we  are  about  to  calculate  error  probability 
bounds  for  odd-height  trees  as  follows. 

Theorem  2.3.2.  Suppose  that  ( a 0,  /?o)  G  R  and  log  N  is  odd.  Then 
(log Fq  1  -  1)  <  logP^1  <  v^logLg1. 

The  proof  is  given  in  Appendix  I.  Next  we  consider  the  special  case  where  there 
exists  m  £  {0, 1, ... ,  log  N  —  2}  such  that  (am,  /3m)  £  B2  IT  Ru . 


0.4 
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(a)  (b) 

Figure  2.9:  (a)  Ratio  Lk+i/Lp.  in  U.  (b)  Ratio  L^+i/Lk  in  U.  Each  line  depicts  the 
ratio  versus  ak  for  a  fixed  /?*.. 


Proposition  2.3.9.  Suppose  that  ( a k,  P k)  £  Bi  and  Pk-i)  £  B2  D  Ru .  Then, 

1<Lk±±<l. 

2  Lk 

The  proof  is  given  in  Appendix  J.  Fig.  2.10  shows  a  plot  of  values  of  Lk+i/Lk  in 
this  case. 

We  have  proved  in  Proposition  2.3.5  that  if  (ak,  Pk)  is  in  B2  IT  Ru,  then  the  ratio 
Lk+2/L^  £  [1, 2].  However,  if  we  analyze  each  level  of  fusion,  it  can  be  seen  that  the 
total  error  probability  decreases  exponentially  fast  from  B2  IT  Ru  to  B i  (Proposition 
2.3.1).  Proposition  2.3.9  tells  us  that  the  fusion  from  B i  to  Rc  is  a  bad  step,  which 
does  not  contribute  significantly  in  decreasing  the  total  error  probability. 

We  can  now  provide  bounds  for  the  total  error  probability  at  the  fusion  center. 

Theorem  2.3.3.  Suppose  that  (a o,flo)  G  R,  log  TV  is  an  odd  integer,  and  there  exists 
m  €  {0, 1, ... ,  log  N  —  2}  such  that  (am,  Pm)  £  B2  IT  Ru- 

If  m  is  even,  then 

V2N  (logLf 1  —  l)  <  logP^1  <  V^logLo1. 

Ifm  is  odd,  then 


The  proof  is  given  in  Appendix  K.  Finally,  by  combining  all  of  the  analysis  above 
for  step-wise  reduction  of  the  total  error  probability,  we  can  write  general  bounds  when 
the  initial  error  probability  pair  (op,  Po)  lies  inside  Bm,  where  m  ^  1. 
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Figure  2.10:  Ratio  Lk+i/Lk  in  the  region  /(i?2  U  Ru).  Each  line  depicts  the  ratio 
versus  a  for  a  fixed  j3. 


Theorem  2.3.4.  Suppose  that  (ao,  fhf)  G  Bm,  where  m  is  an  integer  and  m  1. 

If  log  N  <  m,  then  (Corollary  2.3.1) 

N  (log  To  1  -  !)  <  logP^1  <  Nlog  Lq1. 

If  log  N  >  m,  and  log  N  —  m  is  odd,  then 

\/2m~1N  (logip  1  —  l)  <  logP^1  <  V2m~1N  log  Lq1. 

If  log  N  >  m,  and  log  N  —  m  is  even,  then 

s/2 m~2N  (log  Pq”  1  —  l)  <  logP^1  <  V2mN\og  Lq  1. 

The  proof  uses  similar  arguments  as  that  of  Theorem  2.3.1  and  it  is  provided  in 
Appendix  L. 

Remark :  Notice  again  that  the  lower  bounds  for  log  above  are  useful  only  if 
L0  <  1/2.  However,  similar  to  Proposition  2.3.7,  we  can  derive  a  lower  bound  for 
log  Py  ,  which  is  useful  for  all  Lq  £  (0, 1).  It  turns  out  that  this  lower  bound  differs 
from  that  in  Proposition  2.3.7  by  a  constant  term.  Therefore,  it  is  omitted. 


2.3.3  Invariant  region  in  Bx 

Consider  the  region  {(a,/3)  £  U \/3  <  y/a  and  /3  >  1  —  (1  —  ct)2},  which  is  a  subset 
of  B\  (see  Fig.  2.11(a)).  Denote  the  union  of  this  region  and  its  reflection  with  respect 
to  j3  =  a  by  S.  It  turns  out  that  S  is  also  invariant. 
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Proposition  2.3.10.  If  (otk0,  Pk0)  €  S,  then  (ak,  6k)  €  S  for  all  k  >  ko- 


The  proof  is  given  in  Appendix  M.  Fig.  2.11(b)  shows  a  single  trajectory  of  the 
dynamic  system  which  stays  inside  S. 


Figure  2.11:  (a)  Invariant  region  S  (between  dashed  lines)  lies  inside  B i  (between  solid 
lines),  (b)  A  trajectory  of  the  system  which  stays  inside  S. 


We  have  given  bounds  for  Pjy,  which  is  (twice)  the  total  error  probability.  It  turns 
out  that  for  the  case  where  (cco,  /3o)  €  S,  we  can  bound  the  Type  I  and  Type  II  errors 
individually. 

Proposition  2.3.11.  If(ak,Pk )  €  S,  then 


and 


1  < 


Oik+2 


<  4 


1  < 


Pk+ 2 

PI 


<  4. 


The  proof  is  ommitted. 

Remark :  It  is  easy  to  see  that  as  long  as  the  system  stays  inside  B i,  then  in  a  similar 
vein,  these  ratios  otk+z/oi^  and  Pk+2/Pl  are  lower  bounded  by  1  and  upper  bounded 
by  a  constant.  But  recall  that  Bi  is  not  an  invariant  region.  Thus,  it  is  more  interesting 
to  consider  S. 


Proofs  are  omitted  because  they  are  along  similar  lines  to  those  in  the  other  proofs. 
As  before,  these  inequalities  give  rise  to  bounds  on  sequences  {a^}  and  {/3/e}.  For 
example,  for  {a.k }.  we  have  the  following. 

Corollary  2.3.2.  If  {a 0,  fif)  £  S  and  k  is  even,  then 

2fc/2  (log  a"1  -  2)  <  log  af1  <  2fe/2  log  a^1. 
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2.3.4  Unequal  likely  hypotheses 


In  this  section  we  consider  the  situation  of  unequally  likely  hypotheses;  that  is,  P(TFo)  7^ 
P(Hi).  Suppose  that  the  fusion  rule  is  as  before:  the  likelihood  ratio  test  with  unit 
threshold.  The  resulting  total  error  probability  for  the  nodes  at  level  k  is  equal  to 
Lk  =  P(Ho)ctk  +  P(Hi)/3k,  and  the  total  error  probability  at  the  fusion  center  is 
PN  =  /-  log  ,v-  We  are  interested  in  bounds  for  P/y 

Because  the  fusion  rule  is  the  same  as  before,  the  previous  bounds  for  log  kg  1 
hold.  From  these  bounds,  we  now  derive  bounds  for  /  'v .  Without  loss  of  generality, 
we  assume  that  P(TT0)  <  P(Hi).  We  obtain  the  following: 

P(H0)Lk  <  P(H0)ak  +  P(Pi)/3fc  <  P{Hi)Lk. 

From  these  inequalities,  we  can  derive  upper  and  lower  bounds  for  log  Pv  1 .  For  ex¬ 
ample,  in  the  case  where  (ao,/?o)  G  R  and  log  AT  is  even  (even-height  tree),  from 
Theorem  2.3.1,  we  have 

v^V(log  Lq1  -  1)  <  logP^1  <  VNlogL^1, 

from  which  we  obtain 

v^aogLo1  -  1)  +  log P(Pi)-1  <  logP^1  <  v^logLo1  +logP(Po)_1. 

We  have  derived  error  probability  bounds  for  balanced  binary  relay  trees  under 
several  scenarios.  In  the  next  section,  we  will  use  these  bounds  to  study  the  asymptotic 
rate  of  convergence. 


2.4  Asymptotic  Rates 


The  asymptotic  decay  rate  of  the  total  error  probability  with  respect  to  N  is  considered 
while  the  performance  of  the  sensors  is  constant  is  the  first  problem  to  be  tackled.  Then 
we  allow  the  sensors  to  be  asymptotically  crummy,  in  the  sense  that  cto  +  A)  — S ►  1.  We 
prove  that  the  total  error  probability  still  converges  to  0  under  certain  conditions.  Last, 
we  will  compare  the  detection  performance  by  applying  different  strategies  in  balanced 
binary  relay  trees. 

In  this  section,  we  use  the  following  notation:  for  positive  functions  f  and  g  defined 
on  the  positive  integers,  if  there  exist  positive  constants  C\  and  C2  such  that  cig(N)  < 
f(N)  <  C2g(N)  for  all  sufficiently  large  N,  then  we  write  f(N)  =  Q(g(N)).  For 
N  — >  00,  the  notation  f(N)  ~  g(N)  means  that  f(N)/g(N)  — ►  1,  f(N)  =  ui(g(N)) 
that  f(N)/g(N)  00,  and  /(TV)  =  o(g(N))  that  f(N)/g(N)  0. 

2.4.1  Asymptotic  decay  rate 

Notice  that  as  N  becomes  large,  the  sequence  {(ak,  j3k)}  will  eventually  move  into  the 
invariant  region  R  at  some  level  and  stays  inside  from  that  point.  Therefore,  it  suffices 
to  consider  the  decay  rate  in  the  invariant  region  R.  Because  error  probability  bounds 
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for  trees  with  odd  height  differ  from  those  of  the  even-height  tree  by  a  constant  term, 
without  loss  of  generality,  we  will  only  consider  trees  with  even  height. 

Proposition  2.4.1.  If  L0  =  ao  +  ffj  is  fixed,  then 

\ogP~1  =  Q(VN). 

Proof.  If  Lo  =  ao  +  fio  is  fixed,  then  by  Proposition  2.3.7  we  immediately  see  that 
PN  — ►  0  as  N  — ►  oo  (logP^1  — »  oo )  and  there  exists  a  finite  k  such  that  L *.  <  1/2. 
To  analyze  the  asymptotic  rate,  we  may  assume  that  Lo  <  1/2.  In  this  case,  the  bounds 
in  Theorem  2.3.1  show  that 

log  p-1  =  e(Viv). 


□ 

This  implies  that  the  convergence  of  the  total  error  probability  is  sub-exponential; 
more  precisely,  the  exponent  is  essentially  \fN . 

In  the  special  case  where  (ao,/?o)  €E  S,  the  Type  I  and  Type  II  error  probabili¬ 
ties  decay  to  0  with  exponent  \/N  individually.  Moreover,  it  is  easy  to  show  that  the 
exponent  is  still  \fN  even  if  the  prior  probabilities  are  unequal. 

Given  L0  £  (0, 1)  and  e  £  (0, 1),  suppose  that  we  wish  to  determine  how  many 
sensors  we  need  to  have  so  that  Py  <  e.  If  To  <  1/2,  then  the  solution  is  simply  to 
find  an  N  (e.g.,  the  smallest)  satisfying  the  inequality 

v7 N  (log  ig  1  -  1)  >  -  log  £. 


In  consequence,  we  have 


N  >  '((logL0  1  -  l)log£)2. 

The  smallest  N  grows  like  0((loge)2)  (cf.,  [32],  in  which  the  smallest  N  has  a  larger 
growth  rate).  If  Lo  >  1/2,  then  by  Proposition  2.3.7  we  can  deduce  how  many  levels 
k  are  required  so  that  Lk  <  1/2: 

^VlogLp-1  >- log  1  =  1. 

Therefore,  N  has  to  satisfy 

N  >  (log Lq  1)-4, 

which  implies  that 

k  >  41og(log  Lo-1)-1. 

Combining  with  the  above  analysis  for  the  case  where  L0  <  1  /2,  we  can  then  deter¬ 
mine  the  number  of  sensors  required  so  that  P.y  <  e. 
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2.4.2  Crummy  sensors 


In  this  part  we  allow  the  total  error  probability  of  each  sensor,  denoted  by  L^\  to 
depend  on  N  but  still  to  be  constant  across  sensors. 

If  L<nN'1  is  bounded  by  some  constant  L  €  (0, 1)  for  all  N,  then  clearly  P 's  — >  0.  It 
is  more  interesting  to  consider  Lq  —>  1,  which  means  that  sensors  are  asymptotically 
crummy. 

Proposition  2.4.2.  Suppose  that  LqN'>  =  1  —  ps  with  ps  — ►  0. 

(1)  If  ps  >  c\/\fN,  then  Ps  <  e~Cl. 

(2)  Ifp N  =  w(l /\/N),  then  PN  -»  0. 

(3)  Ifps  <  C2/VN,  then  P s  >  e~°2. 

(4)  Ifps  =  o(l /y/N),  then  Ps  — >  1. 


Proof  First  we  consider  part  (1).  We  have 

v^logOL^r1  =  -yfN log(l  -  pN). 

But  as  x  — >■  0,  —  log(l  —  x)  ~  x/  ln(2),  from  which  we  obtain 
v^TVlog^Q^)-1  ~  ps  y/~N /  ln(2) . 

From  Proposition  2.3.7,  it  is  easy  to  see  that  if  we  have  ps  >  c\/\/N ,  then  for  suffi¬ 
ciently  large  TV  we  obtain 

logP^1  >  tfNlogiL^)-1  >  d/ln(2), 

that  is, 

PN  <  2_Cl/ln(2)  =  e-Cl. 

Moreover,  if  p s^N  — ►  00,  that  is,  ryjv  =  w(  1/ \fN),  then  Py  — >  0.  This  finishes  the 
proof  for  part  (2). 

Next  we  consider  parts  (3)  and  (4).  We  have 

VTVlogtL^)-1  =  -y/Nlog(l  -  pN), 

from  which  we  obtain 

v/TVlog(Lg7V))_1  ~  psVN/  ln(2). 

From  Theorem  2.3.1,  it  is  easy  to  see  that  if  we  have  ps  <  (■■>  / \fN ,  then  for  sufficiently 
large  N  we  obtain 

logP^1  <  V/TVlog(pQAr))_1  <  c2/ln(2), 


that  is. 


PN  >  2_C2/ln(2)  =  e~°2. 

Moreover,  if  ps\fN  — >  00,  that  is,  ps  =  o{\/s/~N),  then  Ps  —>  1. 


□ 
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Using  part  (3)  of  the  above  proposition,  we  derive  a  necessary  condition  for  Py 

0. 

Corollary  2.4.1.  Suppose  that  LqN ■*  =  1  —  ry^y  with  ip y  -7  0.  77zefi,  Py  — >  0  implies 
that  ip y  =  w(l/ '/N). 

2.4.3  Fixed  point  problem 

We  find  the  following  mathmatical  problem  during  our  research.  Let  a  £  [0, 1]  be 
given.  Define  a  sequence  of  numbers  7fcby 


7*  =  ^!-  V1-"  v/1  -(I--"  (l-(l-a)2)2---)2 

where  the  number  of  nested  squares  and  square  roots  is  k.  So,  for  example,  70  =  a, 

71  =  -y/l  —  (1  —  a)2,  and  so  on. 

Define  0  =  (3  —  y/5)/2,  which  is  simply  the  golden  ratio  minus  one,  or  one  minus 
the  reciprocal  of  the  golden  ratio.  In  other  words,  the  following  expressions  are  all 
equal  to  the  golden  ratio:  2  —  6,  1/(1  —  9),  and  1  /\[6. 

Claim:  For  a  £  [0,0], 

•  the  odd  subsequence  of  {7*;}  converges  from  above  to  yfa,  and 

•  the  even  subsequence  of  {7^}  converges  from  below  to  1  —  (1  —  a)2. 

For  a  £  [6, 1], 

•  the  odd  subsequence  of  {7^ }  converges  from  above  to  1  —  (1  —  a)2,  and 

•  the  even  subsequence  of  {7^}  converges  from  below  to  y/a. 

Equivalently,  for  each  a  £  [0,1],  the  odd  subsequence  convergences  from  above 
to  max(y/a,  1  —  (1  —  a)2),  and  the  even  subsequence  convergences  from  below  to 
min(y/a,  1  —  (1  —  a)2). 

We  have  no  direct  proof  of  this  claims.  However,  what  we  can  easily  show  is  that 
the  odd  subsequence  is  monotone  decreasing  and  bounded  below  by  max(y/o:,  1  — 
(1  —  a)2),  and  that  the  even  subsequence  is  monotone  increasing  and  bounded  above 

by  min(v/a,  1  —  (1  —  a)2). 


Functional  Representation  of  Problem 

Define  the  function  77  :  [0, 1]  — >  [0, 1]  such  that  7 f.  =  rfe(a).  Then,  it  is  apparent  that 
77  satisfies  the  recursion 

T7+i(a)  =  \J\  -  77((1  -  a)2),  a  £  [0,1],  (2.3) 
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with  to  (a)  =  a.  We  can  similarly  write  a  recursion  relating  r^+i  to  tv 

rk+ 2(a)  =  \]l-  \/l-  rfe((l  ~  (1  -  «)2)2),  a  S  [0, 1],  (2.4) 

Starting  with  ro(ct)  =  a  gives  rise  to  the  even  subsequence,  and  starting  with  ri(ct)  = 
t/Ri  —  a)2  gives  rise  to  the  odd  subsequence.  The  convergence  claim  before 
amounts  to  pointwise  convergence  claims  about  these  subsequences  of  functions.  But 
since  we  have  a  recursion  for  them,  they  can  only  converge  to  fixed  points  of  the  recur¬ 
sion  (2.4). 

More  specifically,  r  :  [0, 1]  — >  [0, 1]  is  a  fixed  point  of  the  first  recursion  (2.3)  if 
and  only  if 

r(a)  =  \/l  -  r((l  -  a)2),  a  G  [0, 1],  (2.5) 

With  some  algebraic  manipuations,  we  can  rewrite  this  as 

r(a)  =  1  —  r(l  —  y/a)2,  a  G  [0, 1],  (2.6) 

Similarly,  for  a  fixed  point  of  (2.4), 

r(a)  =  y/l-  ^1  -  r((l  -  (1  -  a)2)2),  a  G  [0, 1],  (2.7) 

If  we  can  show  that  the  two  desired  functions  max(v/a,  1  —  (1  — a)2)  and  min(v/a,  1  — 
(1  —  a)2)  are  the  only  “legitimate”  solutions  of  (2.7),  then  we  are  done. 

Write  the  recursion  for  {r^}  as  r^+i  =  <brv  where  $  is  the  operator  defined 
by  (2.3).  The  recursion  for  the  even  and  odd  subsequences  is  rk+2  =  $2?V  where 
<f>2  means  <1>  composed  with  itself.  When  we  speak  of  a  fixed  point  of  4>,  we  mean  a 
function  r  :  [0, 1]  — >  [0, 1]  such  that  r  =  <J>r.  Two  functions  pi  and  p2  constitute  an 
orbit  of  period  2  of  $  if  p\  =  $>p2  and  p2  =  $pi- 

Some  observations: 

•  Any  fixed  point  of  $  is  also  a  fixed  point  of  <I>2.  Moreover,  two  functions  that 
form  a  an  orbit  of  $  of  period  2  are  both  fixed  points  of  <f>2. 

•  Conversely,  any  fixed  point  of  <f>2  is  either  a  fixed  point  or  a  point  on  an  orbit  of 
$  of  period  2.  For  example,  the  constant  functions  0  and  1  are  fixed  points  of 
$2,  but  they  are  not  fixed  points  of  <l>.  Instead,  they  constitute  an  orbit  of  period 
2. 

•  The  functions  on  [0, 1]  given  by  y/a  and  1  —  (1  —  a)2  are  both  fixed  points  of  <1> 
(and  hence  also  <h2).  Moreover,  the  constant  function  \/ 0  (which  is  also  equal  to 
1  —  9,  1  —  (1  —  9)2,  and  (1/9)  —  2)  is  a  fixed  point  of  <f>  (and  hence  also  $2). 

•  Fixing  a  =  0  in  (2.7),  we  see  that  r(0)  can  only  take  values  0,  1,  or  \[9.  The 
same  can  be  said  for  r(l)  and  r(0).  Any  fixed  point  must  satisfy  these  constraints 
at  a  =  0,  1,  and  9. 

•  $  maps  a  function  on  [0,  9]  to  a  function  on  [ 9 , 1],  and  vice  versa.  So  if  p\ 
and  p2  are  fixed  points  of  $,  then  the  two  functions  given  by  pi(a)l[o,e](a)  + 
p2(a)l(e,i](ct)  and  p2(a)  1[0, #](<*)  +  pi(a)l(e.i](a)  constitute  an  orbit  of  $  of 
period  2.  These  are  functions  obtained  from  p\  and  p2  by  “swapping”  them  on 
the  interval  [0, 9\. 
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•  Similarly,  if  p\  and  p2  constitute  an  orbit  of  <f>,  then  the  two  functions  given  by 
Pi  (a)  l[o,e](a)  +  p2(ct)l(e,i](a)  and  p2(a)l[0,e]{a)  +  pi{a)l^p}{a)  are  both 
fixed  points  of  <1>.  For  example,  both  functions  1  [o,e]  and  1(0, 1]  are  fixed  points 
of$. 

•  $2  maps  a  function  on  [0,  9]  to  a  function  on  [0,  9}  (and  similarly  on  [9, 1]). 
Hence,  for  <f>2,  we  can  consider  the  intervals  [0, 9]  and  [ 9 , 1]  separately  and 
independently.  In  particular,  if  p\  and  p2  are  fixed  points  of  $2,  then  so  are 
the  functions  given  by  pi(a)l[0^  (ct)  +  p2(a)l(e,i]  («)  and  /92(a)l[Oi0](a)  + 
Fi(a)l(e,i](«)- 

•  Finally,  given  two  functions  p  \  and  p2  that  either  are  both  fixed  points  or  con¬ 
stitute  an  orbit  of  period  2  of  <I\  we  can  construct  two  fixed  points  of  <b2: 

Fi(a)1[o,0](a)  +P2(a)l(0,i](a)  and  p2(a)l[o,0](a)  +  Pi(a)l(0,i](a). 

The  two  fixed  points  of  <f>2  of  interest  in  our  original  claim  are  s/ali0  g](a)  + 
(1  -  (1  -  a)2))l(0,i]  (a)  and  (1  -  (1  -  a)2))l(fftlj (a)  +  \/al[o,0]  (a),  which  are 
max(v/a,  1  —  (1  —  a)2)  and  min(v/a,  1  —  (1  —  a)2),  respectively. 


Other  Fixed  Points 

Define  the  function  F  :  [0, 1]  — »  [0, 1]  by  F(x)  =  1  —  x  (the  “flip”  function),  and 
the  function  S  :  [0,1]  — >  [0,1]  by  S(x)  =  x2  (the  “square”  function).  Then,  for 
r  :  [0, 1]  — »  [0, 1],  the  function  $?’  can  be  written  as  S~1FrSF  (in  operator  notation, 
so  that  SF,  say,  means  S  composed  with  F).  Hence,  r  is  a  fixed  point  of  $  if  and  only 
if 

r  =  S~1FrSF,  (2.8) 

which  corresponds  to  (2.5).  Algebraic  manipulations  are  much  easier  using  this  nota¬ 
tion.  For  example,  note  that  F”1  =  F,  from  which  we  can  easily  derive 

r  =  FSrFS~\  (2.9) 

which  corresponds  to  (2.6).  The  two-step  version  (2.7)  is  also  easy  to  write: 

r  =  (S~1F)2r(SF)2,  (2.10) 

Not  that  the  two  functions  a  and  1  —  (1  —  a)2  in  this  new  notation  are  S'-1  and 

FSF ,  respectively. 

To  simplify  the  calculations,  substitute  cr  =  rS  in  (2.8).  Then, 

a(S~1F)  =  ( S~1F)a . 

In  other  words,  r  =  aS -1  satisfies  (2.8)  if  and  only  if  a  commutes  with  ( S~1F ).  Note 
that  S~1F(\/9)  =  S~1F(  1  —  9)  =  \/9,  which  shows  that  the  constant  function  \[9 
commutes  with  ( S~1F ).  It  is  clear  that  any  power  of  ( S~1F )  commutes  with  itself. 
From  this,  we  generate  an  infinite  family  of  such  fixed  points  r: 

r  =  (S~1F)nS~1,  n£Z.  (2.11) 

The  two  special  functions  ,S'~  1  and  FSF  identified  before  are  special  cases  with  n  =  0 
and  n  =  —  2,  respectively. 
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Figure  2.12:  Functions  in  (2.11). 


We  could  similarly  make  the  substitution  r  =  Fr  in  (2.8),  leading  to  t(SF)  = 
( SF)t ,  This  gives  rise  to  the  family 

r  =  F(SF)m,  m£  Z. 

But  note  that  this  family  reduces  to  the  previous  one  by  substituting  m  =  -n  —  1. 
Similarly,  working  with  (2.9)  does  not  generate  any  new  fixed  points. 

If  we  consider  the  two-step  version,  we  arrive  at  the  conclusion  that  r  =  aS -1 
satisfies  (2.10)  if  and  only  if  a  commutes  with  ( S~1F )2.  Naturally,  the  functions 
in  (2.11)  and  the  constant  function  \/T)  also  satisfy  (2.10).  Moreover,  because  0  = 
(S~1F)2( 0)  and  1  =  (S'_1F)2(1),  the  constant  functions  0  and  1  also  clearly  satisfy 
(2.10). 

Figure  2.12  shows  plots  of  several  functions  of  interest  on  [0, 1].  The  blue  and 
green  plots  are  the  functions  in  (2.1 1)  for  nonnegative  and  negative  n,  respectively.  We 
have  pointed  out  the  two  cases  n  =  0  (i.e.,  ^/a)  and  n  =  —2  (i.e.,  1  —  (1  —  a)2) 
using  solid  lines  (in  contrast  to  the  other  dashed  lines).  The  solid  black  vertical  line  is 
at  a  =  9.  The  solid  black  horizonal  line  is  the  constant  function  \/7).  Notice  that  all 
the  blue  and  green  plots  intersect  at  ( 9 ,  VO).  The  plots  in  Figure  2.12  depict  a  large 
family  of  fixed  points  of  (2.7),  recalling  that  we  can  combine  blue  and  green  plots  on 
the  intervals  [0,  9]  and  ( 9 , 1]. 

Figure  2.13  overlays  on  Figure  2.12  four  red  plots,  depicting  the  first  four  iterations 
k  =  1, . . . ,  4  of  (2.3).  The  two  red  plots  above  yj~a  in  the  region  [0,  9]  are  the  odd 
iterates  ( k  =  1, 3)  and  the  two  red  plots  below  1  —  (1  —  a)2  in  the  region  [0, 9]  are  the 
even  iterates  ( k  =  2, 4).  Note  that  the  solid  blue/green  plots  are  the  only  ones  that  lie 
between  the  red  plots.  Recall  that  odd  subsequence  of  (2.7)  is  monotone  decreasing  and 
bounded  below  by  max(y/a,  1  —  (1  —  a)2),  and  that  the  even  subsequence  is  monotone 
increasing  and  bounded  above  by  min(y/a,  1  —  (1  —  a)2).  Hence,  among  all  the  fixed 
points  in  Figure  2.12,  only  these  two  are  legitimate  fixed  points  of  (2.7). 

Are  there  other  fixed  points  of  (2.7)?  Recall  that  any  fixed  point  r  =  erS'-1  is 
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Figure  2.13:  Functions  in  (2.11)  together  with  subsequences  of  (2.3)  in  red. 


such  that  <r  commutes  with  (S  1  F)1 .  We  have  pointed  out  that  any  integer  power  of 
(S~1F)  will  do  the  job. 

Are  there  other  functions  that  commute  with  ( S~1F)21  One  candidate  is  this: 
Suppose  that  ( S~1F )2  has  a  commutable  factorization,  which  means  that  (S~1F)2  = 
AB  =  BA  for  functions  A  :  [0, 1]  — >  [0, 1]  and  B  :  [0, 1]  — >  [0, 1].  Then,  any  integer 
power  of  A  and  B  commutes  with  ( S~1F )2.  Clearly  A  =  B  =  S~1F  is  such  an 
example.  But  are  there  others? 


2.4.4  Comparison  of  simulation  results 

We  end  this  section  by  comparing  the  quantitative  behavior  of  the  unit-threshold  likelihood- 
ratio  rule  with  that  of  other  fusion  rules  of  interest.  First,  we  define  two  particular 
fusion  rules  that  can  be  applied  at  an  individual  node: 


•  OR  rule:  the  parent  node  decides  0  if  and  only  if  both  the  child  nodes  send  0; 

•  AND  rule:  the  parent  node  decides  1  if  and  only  if  both  the  child  nodes  send  1. 


Notice  that  the  unit-threshold  likelihood-ratio  rule  reduces  to  either  the  AND  rule 
or  the  OR  rule,  depending  on  the  values  of  the  Type  I  and  Type  II  error  probabilities 
at  the  particular  level  of  the  tree.  For  our  quantitative  comparison,  we  consider  three 
system-wide  fusion  strategies  that  we  will  compare  with  the  case  that  uses  the  unit- 
threshold  likelihood-ratio  rule  at  every  node: 

•  OR  strategy:  Every  fusion  uses  the  OR  rule; 

•  AND  strategy:  Every  fusion  uses  the  AND  rule; 
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number  of  sensors 

Figure  2.14:  Total  error  probability  plots.  Dashed  line:  centralized  parallel  fusion 
strategy.  Solid  line:  unit-threshold  likelihood-ratio  rule  for  balanced  binary  relay  tree. 
Dotted  line  with  ‘o’  marker:  OR  strategy.  Dotted  line  with  “+’  marker:  AND  strategy. 
Dash-dot  line:  RAND  strategy. 


•  RAND  strategy:  At  each  level  of  the  tree,  we  randomly  pick  either  the  AND  rule 
or  the  OR  rule  with  equal  probability,  and  independently  over  levels,  and  apply 
that  rule  to  all  the  nodes  at  that  level. 

In  Fig.  2.14,  we  show  plots  of  the  total  error  probability  as  a  function  of  N  for  the  tree 
that  uses  the  unit-threshold  likelihood-ratio  rule  at  every  node  (the  one  analyzed  in  this 
report).  We  also  plot  the  total  error  probabilities  for  the  AND  and  OR  strategies,  as  well 
as  the  average  total  error  probability  over  100  independent  trials  of  the  RAND  strategy. 
For  comparison  purposes,  we  also  plot  the  error  probability  curve  of  the  centralized 
parallel  fusion  strategy. 

We  can  see  from  Fig.  2.14  that  the  total  error  probability  for  the  centralized  parallel 
strategy  decays  to  0  faster  than  that  of  the  binary  relay  tree  that  uses  the  unit-threshold 
likelihood-ratio  rule  at  every  node.  This  is  not  surprising,  because  the  former  is  known 
to  be  exponential,  as  discussed  earlier,  while  the  latter  is  sub-exponential  with  exponent 
//V,  as  shown  in  this  report.  The  AND  and  OR  strategies  both  result  in  total  error 
probabilities  converging  monotonically  to  1/2,  while  the  RAND  strategy  results  in  an 
average  total  error  probability  that  does  not  decrease  much  with  N. 

We  have  studied  the  detection  performance  of  balanced  binary  relay  trees.  We 
precisely  describe  the  evolution  of  error  probabilities  in  the  (a,  (3)  plane  as  we  move 
up  the  tree.  This  allows  us  to  deduce  error  probability  bounds  at  the  fusion  center  as 
functions  of  N  under  several  different  scenarios.  These  bounds  show  that  the  total 
error  probability  converges  to  0  sub-exponentially,  with  an  exponent  that  is  essentially 
y/~N.  In  addition,  we  allow  all  sensors  to  be  asymptotically  crummy,  in  which  case  we 
deduce  the  necessary  and  sufficient  conditions  for  the  total  error  probability  to  converge 
to  0.  All  our  results  apply  not  only  to  the  fusion  center,  but  also  to  any  other  node  in 
the  tree  network.  In  other  words,  we  can  similarly  analyze  a  sub-tree  inside  the  original 
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tree  network. 
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Chapter  3 


Sensor  Failures  and 
Communication  Link  Failures 


In  this  Chapter,  we  further  study  the  detection  performance  of  balanced  binary  relay 
trees  with  sensor  failures  and  communication  link  failures  build  on  the  formulations  in 
Chapter  II. 


3.1  Sensor  Failures 

3.1.1  Problem  formulation 

We  keep  all  the  notations  and  definations  defined  in  Chapter  II.  Moreover  for  the  sensor 
failure  case,  we  assume  that  all  sensors  have  identical  failure  probability  q$.  Assuming 
equal  prior  probabilities,  we  use  the  likelihood-ratio  test  [41]  when  fusing  binary  mes¬ 
sages  at  intermediate  relay  nodes  and  the  fusion  center.  Consider  the  simple  problem 
of  fusing  binary  messages  passed  to  a  node  by  its  two  immediate  child  nodes.  Assume 
that  the  two  child  nodes  have  identical  Type  I  error  probability  a,  identical  Type  II 
error  probability  (3,  and  identical  failure  probability  q. 

Denote  the  Type  I  error.  Type  II  error,  and  failure  probabilities  after  the  fusion  by 
(a1,  (3'  ,q').  This  parent  node  fails  to  provide  any  message  to  the  node  at  the  next  level 
if  and  only  if  both  its  child  nodes  fail  to  forward  any  message.  Hence,  we  have 

q'  =  q2-  (3.1) 

If  one  of  the  child  nodes  fails  and  the  other  one  sends  its  message  to  the  parent 
node,  then  Type  I  and  Type  II  error  probabilities  do  not  change  since  the  parent  node 
receives  only  one  binary  message.  The  probability  of  this  event  is  2q(  1  —  q),  in  which 
case  we  have 

(a',p')  =  (a,/3).  (3.2) 
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Figure  3.1:  A  balanced  binary  relay  tree  with  height  k.  Circles  represent  sensors  mak¬ 
ing  measurements.  Diamonds  represent  relay  nodes  which  fuse  binary  messages.  The 
rectangle  at  the  root  represents  the  fusion  center  making  an  overall  decision. 


If  both  child  nodes  send  their  messages  to  the  parent  node,  then  the  scenario  is  the 
same  as  that  in  [32]  and  [33].  The  probability  of  this  event  is  (1  —  q )2,  in  which  case 
we  have 

f  (l-(l-a)2,/32),  a</3, 

{a',P)  =  {  (3-3) 

{  (a2, 1  —  (1  —  /3)2),  a  >  (3. 


Let  a1  and  8'  be  the  mean  Type  I  and  Type  II  error  probabilities  conditioned  on  the 
event  that  at  least  one  of  these  child  nodes  forwards  its  message  to  the  parent  node,  i.e., 
the  parent  node  has  data.  We  have 


=  f(ot,/3,q) 


(  (1  —  g)(2a-  o?)+2q<x  (l-g)/32+29/3  2\  n<  a 

^  1+q  ’  1 +<?  I*  J  ’  u 

((i-yg+2ga,  (Hiy>+¥y)  ,a>0. 


(3.4) 

(3.5) 


Our  assumption  is  that  all  sensors  have  the  same  error  probabilities  (a0j  Po,  Qo)- 
Therefore  by  (3.5),  all  relay  nodes  at  level  1  will  have  the  same  error  probability  triplet 
(cti,  Pi,  qi)  =  f(a o,  /3o,  qo )  (where  a\  and  /?i  are  the  conditional  mean  error  proba¬ 
bilities).  Similarly  by  (3.4),  we  can  calculate  error  probability  triplets  for  nodes  at  all 
other  levels.  We  have 

(ak+i,Pk+i,qk+i)  =  f{oik,l3k,qk),  k  =  1,2,...,  (3.6) 

where  (afe,  /3k,  qk)  is  the  error  probability  triplet  of  nodes  at  the  /.:th  level  of  the  tree. 
Notice  that  if  we  let  qo  =  0,  then  the  recursive  relation  reduces  to  the  recursion  in  [33]. 
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The  relation  (3.6)  allows  us  to  consider  (ak,  Pki  Qk)  as  a  discrete  dynamic  system. 
For  the  case  where  go  =  0,  we  have  studied  (See  [33])  the  precise  evolution  of  the 
sequence  { (ak.  Pk) },  derived  total  error  probability  bounds  as  functions  of  N,  and 
established  asymptotic  decay  rates.  In  this  report,  we  will  study  the  case  where  go  /  0. 
We  will  derive  total  error  probability  bounds  and  determine  the  decay  rate  of  the  total 
error  probability. 

To  develop  intuition,  let  us  start  by  looking  at  the  single  trajectory  shown  in  Fig. 
3.2(a),  starting  at  the  initial  state  (ao,  Po,  go).  We  observe  that  q k  decreases  very  fast 
to  0.  In  addition,  as  shown  in  Fig.  3.2(b),  the  trajectory  approaches  p  =  a  at  the 
beginning.  After  (ak,  Pk)  gets  too  close  to  p  =  a,  the  next  pair  (ak+i,  Pk+i)  will  be 
repelled  toward  the  other  side  of  the  line  p  =  a.  This  behavior  is  similar  to  the  scenario 
where  q  =  0.  For  the  case  where  q  =  0,  there  exist  an  invariant  region  in  the  sense 
that  the  system  stays  in  the  invariant  region  once  the  system  enters  it  [33].  Is  there 
an  invariant  region  for  the  case  where  ij  /  0?  We  answer  this  question  by  precisely 
describing  this  invariant  region  in  R3. 


Figure  3.2:  (a)  A  typical  trajectory  of  (ak,  pk,  qk)  in  the  (cc,  /3,  q)  coordinates,  (b)  The 
trajectory  in  (a)  projected  onto  the  (a,  P)  plane. 


3.1.2  The  evolution  of  Type  I,  Type  II,  and  sensor  failure  error 
probabilities 

The  relation  (3.5)  is  symmetric  about  the  hyper-planes  a  +  p  =  1  and  P  =  a.  Thus, 
it  suffices  to  study  the  evolution  of  the  dynamic  system  only  in  the  region  bounded  by 
a  +  P  <  1,  p  >  a,  and  0  <  q  <  1.  Let  U  :=  {(a,  /?,  q)  >  0|a  +  P  <  1,  P  > 
a,  and  0  <  q  <  1}  be  this  triangular  prism.  Similarly,  define  the  complementary 
triangular  prism  £  :=  {( a ,  /3,  q)  >  0|a  +  P  <  1,  P  <  a,  and  0  <  g  <  1}. 

First,  we  denote  the  following  region  by 

B1  :=  {(a,P,q)  £  U\P  <  (- q+\Jq 2  +  (1  -  q)2{2a  -  a2)  +  2q(\  -  q)a)/(l-q)}. 
If  {ak,Pk,Qk)  £  B\,  then  the  next  pair  (ak+i,  Pk+i,  Qk+i)  jumps  across  the  plane /3  = 
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a  away  from  (ak,  /3k,qk)-  More  precisely,  if  (ak,/3k,  qk)  GW,  then  {ak,/3k,  qk)  &  B  i 
if  and  only  if  (ak+ 1,  /3k+i,qk+i)  G  £• 

It  is  easy  to  see  from  (3.5)  and  (3.6)  that,  if  we  start  with  (ao>  /?tb  9o)  &U\B\,  then 
before  the  system  enters  B i,  we  have  >  ak  and  (3k+i  <  f3k.  Thus,  the  system 
moves  towards  the  f3  =  a  plane.  Therefore,  if  the  number  of  sensors  N  is  sufficiently 
large,  then  the  system  is  guaranteed  to  enter  Bi . 

Next  we  consider  the  behavior  of  the  system  after  it  enters  B\.  If  (ak,  f3k,  qk)  G  f?i, 
we  consider  the  position  of  the  next  pair  (ak+i,  /3k+i,  qk+i),  i.e.,  consider  the  image 
of  B\  under  /,  denoted  by  lie  ■  Similarly  we  denote  the  reflection  of  lie  with  respect 
to  f3  =  a  by  Ru.  We  find  that 

Ru  ■=  {( a,fi,q )  G  U\f3  <  -a  +  2{yJ q2  +  (1  -  q2)a  -  q)/{  1  -  q)}. 

The  sets  Hu  and  /i|  have  some  interesting  properties.  We  denote  the  projection  of 
the  upper  boundary  of  Ru  and  Bi  onto  the  (a,  f3)  plane  for  a  fixed  q  by  R ^  and  B\ , 
respectively.  It  is  easy  to  see  that  if  <71  <  q 2,  then  R ^  lies  above  f?®  in  the  (a,/3) 
plane.  Similarly,  if  <71  <  q2 ,  then  Bf1  lies  above  Bf2  in  the  (a,  (3)  plane.  Moreover, 
we  have  the  following  proposition. 

Proposition  3.1.1.  Bi  c  Ru- 


Proof.  B 1  and  liu  share  the  same  lower  boundary  /3  -  a.  Thus,  it  suffices  to  proof 
that  the  upper  boundary  of  li\  is  below  that  of  liu  for  a  fixed  q,  i.e.,  li'f  lies  above  B'( 
in  the  (a,  (3)  plane. 


The  upper  boundary  of  B  \  is 

a  __  -q  +  \/q2  +  (!  -  q)2{ 2a  -  «2)  +  2g(  1  -  q)a 


The  upper  boundary  of  Hu  is 

(3  =  —a  +  2 


l-<7 

y/g2  +  (1  -  q2)a  —  q 

1-9 


Notice  that  when  q  =  0,  these  boundaries  reduce  to  the  boundaries  in  [33].  We  need  to 
prove  the  following: 

—9  +  yjq2  +  (1  -  g)2(2a  -  a2)  +  2g(l  -  q)a 


<  -a +  2 

It  suffices  to  show  that 


1  -q 

v V  +  (1  -  q2)a  -  q 
1-9 


\J q2  +  (1  -  q)2(2a  -  a2)  +  2g(l  -  q)a 

<  — a(l  -  q)  -  q  +  2i/g2  +  (1  -  q2)a. 

Squaring  both  sides  and  simplifying,  we  have 

2\Jq 2  +  (1  -  q2)a(a(l  -  9)  +  9) 

<  2(g2  +  (1  —  q2)a)  —  (1  —  q)2(a  —  a2). 
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Again  squaring  both  sides  and  simplifying,  we  have 

4 (q2  +  (1  -  q2)a)(q2  +  2q(l  -  q)a  +  (1  -  q)2a2 
-q2  -  (1  -  q2)a  +  (1  -  q)2{a  -  a2)) 

<  (1  -q)\a-a2)2. 

Fortuitously,  the  left  hand  side  turns  out  to  be  identically  0.  Thus,  the  inequal¬ 
ity  holds.  The  reader  can  refer  to  Fig.  3.3(a)  and  Fig.  3.3(b)  for  plots  of  the  upper 
boundaries  of  Ru  and  If  for  two  fixed  values  of  q. 


□ 


a  a 

(a)  (b) 


Figure  3.3:  (a)  Upper  boundaries  for  Ru  and  B\  for  q  =0.1.  (b)  Upper  boundaries  for 
Ru  and  B\  for  q  =  0.01. 

We  denote  the  region  Ru  U  Rc  by  R.  We  show  below  that  R  is  an  invariant  region 
in  the  sense  that  once  the  system  enters  R ,  it  stays  there. 

Proposition  3.1.2.  If  (ako ,  /3ko ,  qko )  £  R  for  some  k0,  then  (ak,/3k,qk)  £  R  for  all 
k  >  ko. 

Proof  Without  lost  of  generality,  we  assume  (ak,  (3k,  qk)  £  Ru-  We  know  that  Rc 
is  the  image  of  U  in  C.  Thus  if  the  next  state  (ak+i,  pk+i,  qk+i)  G  C ,  then  it  must 
be  inside  Rc-  We  already  have  qk+i  <  qk,  which  indicates  that  lies  above 

R'lf  in  the  (a,  i3)  plane.  Moreover,  for  a  fixed  q,  the  upper  boundary  Uqu  is  monotone 
increasing  in  the  (a,  j3)  plane.  We  already  know  that  ak+±  >  ak  and  j3k+i  <  f3k-  As 
a  result,  if  the  next  state  (ak+i, Pk+iyQk+i)  €  U,  then  the  next  state  is  in  fact  inside 
Ru- 


□ 

We  have  shown  that  the  system  enters  Bi  after  certain  levels  of  fusion.  By  the  fact 
that  /i|  C  Ru,  we  conclude  that  the  system  enters  Ru  at  some  level  of  the  tree  and 
stays  inside  the  invariant  region  R  at  all  levels  above. 
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In  the  next  section,  we  will  consider  the  step-wise  reduction  of  the  total  error  prob¬ 
ability  when  the  system  lies  inside  the  invariant  region  and  deduce  upper  and  lower 
bounds  for  the  total  error  probability. 


3.1.3  Error  probability  bounds 


Recall  that  the  total  detection  error  probability  for  a  node  at  the  fc-th  level  is  (ak+Pk)/ 2 
because  of  the  equal-prior  assumption.  Let  Lk  =  0'/,:  +Pk,  which  is  twice  the  total  error 
probability.  We  will  derive  bounds  on  log  Lf1 ,  whose  growth  rate  is  related  to  the  rate 
of  converge  of  Lk  to  0. 


Proposition  3.1.3.  Let  L^\  be  the  total  error  probability  at  the  next  level  from  the 
current  state  (ak,Pk,q)-  Suppose  that  (ak,Pk,q i)  and  (otk,  (3k,  <12)  G  U.  Ifq\  <  q2, 


then 


4+1  <  L 


(92) 
k+ 1> 


with  equality  if  and  only  if  ak  =  Pk- 


Proof  From  (3.5),  we  have 


T(q)  _  1-Qt(0) 
k+l  ~  1  _|_  q  k+1 


2  q 

1  +  q 


( Olk  +  Pk), 


where  4+i  =  2«fc  ~al+ Pi- 

It  is  easy  to  show  that  2otk  —  a\  +  <  ak  +  Pk- 


2  Ctfc  —  otk  +  Pk  —  ak  +  Pk 

ak  -  a2k  <Pk~  Pi- 


Since  ak  +  /3fe  <  1  and  f3k  >  oik,  we  have  j3k  —  1/2  <  1/2  —  ak-  Notice  that  the 
function  x  —  x2  peaks  at  x  =  1/2.  Hence,  2ak  —  a2  +  f3k  <  ak  +  Pk  with  equality  if 
and  only  if  ak  =  Pk- 


Notice  that 


Therefore  we  can  write 


1-g 
1  +  <7 


2  q 

1  +  q 


=  1. 


4+1  —  Pi  4+1  +  (1  ~Pl){ak  +  Pk), 

where  p\  =  (1  —  qi)/(l  +  q\)-  Let  p2  =  (1  —  <?2 ) / ( 1  +  q-i),  it  is  easy  to  see  that 
Pi  >  P2-  Thus  we  have 

4+1  =  Pi4+i  +  (!  ~Pi)(ak  +Pk) 

+{p2  -  Pi)4+i  -  (P2  -  Pi)4+i 

<  Pi4+i  +  (1  ~Pi)(ak  +Pk) 

+{P2  -Pi)4+i  _  (P2  ~Pi)(ak  +  Pk) 

T  (®) 

_  "^fc+l- 

Moreover,  ‘=’  holds  if  and  only  if  ok  =  Pk-  □ 
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From  Proposition  3.1.3,  we  immediately  deduce  that  for  any  q\  >  0, 


L 


(0) 


fe+i  — 


<  L 


(9i) 

fc+1' 


This  means  that  the  decay  of  the  total  error  probability  for  a  single  step  is  the  fastest 
when  q  =  0.  As  a  result,  for  the  case  where  q  0,  the  step-wise  shrinkage  of  the  total 
error  probability  cannot  be  faster  than  the  case  where  q  =  0,  where  the  asymptotic 
decay  exponent  is  \/A?  [33], 

Notice  that  from  (3.1),  the  decay  of  qk  is  quadratic,  which  is  much  faster  than  the 
decay  rate  of  Lk.  Moreover,  it  is  easy  to  see  that  the  decay  of  qk  is  faster  than  the  decay 
of  ctk  and  of  pk.  Hence,  it  is  natural  to  assume  that  q k  <  ak  and  q k  <  pk  when  we 
consider  the  step-wise  shrinkage  of  the  total  error  probability  in  the  invariant  region. 
Next  we  give  upper  and  lower  bounds  for  the  ratio  Lk+2/L2- 

Proposition  3.1.4.  Suppose  that  (ctk,  Pk,  Qk)  G  R,  otk>  qk,  and  pk  >  qk ■  Then, 


1  ^  Lk+ 2 

2 -^T 


<  4. 


Proof.  First,  we  consider  the  lower  bound.  The  evolution  of  the  system  is 

(otk,Pk,Qk)  (ctfc+l,  fik+UQk)  (ak+2,  Pk+2,Qk)- 

From  Proposition  3.1.3,  we  have 


T  (0)  ^  T 

-kfc+2  A  J^k+ 2) 

where  L^+2  =  ^ak+i  ~  ak+i  +  P'k+i  as  defined  before.  To  prove  1/2  <  Lk+2/Ll,  it 
suffices  to  show  that  1/2  <  i^2/I/|. 

If  (' otk,Pk )  G  Ru  \  Bi,  then 

■^fc+2  _  2ttfc+ 1  —  Otj+l  Pk+1 
Rf.  (ctk  +  Pk)2 


We  have 


_  1  Qk  2\  ,  2 qk  . 

®-k-\- 1  —  1  Q'k)  i  ®*k  _  (%k 

1  +  qk  1  +  qk 


and 


Pk+ 1  — 


1  Qk  n2  .  2Qk  ^  a2 


-3Z 

1  +  qk  k  1  +  Qk 


Pk>  Pk- 


Thus,  it  suffices  to  show  that 


2ak  -  a2  +  Pj  >  1 

(ctfc  +  Pk)2  2 


It  is  easy  to  see  that 

2(2afe  -a2k)>  1  —  (1  —  ctk)\ 
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Hence,  it  suffices  to  show  that 

(1  —  (1  —  ak )4  +  Pk)  —  ( ak  +  Pk)2, 

which  has  been  proved  in  [33], 

If  (ak,  Pk)  G  Bi,  then  it  suffices  to  show  that 

ak+l  +  2Pk+ 1  ~  Pk+1  >  1 

(c«fc  +  Pk)2  2 


We  have 


otk+ 1  =  |  qk  (2 ak  -  a2k)  +  afc  >  ak 

k  +  qk  k  +  qk 


Thus,  it  suffices  to  proof 


which  is  easy  to  see. 


ak  +  Pi  s.  1 

(a*  +  &)2  -  2’ 


Next  we  prove  the  upper  bound  of  the  ratio  Lk+2/L2. 

If  (oik,  Pk)  e  Ru  \  B±,  then 

ifc+2  ,  ifc+i  1  —  9fc  2afc  -  o|  +  /3?  2gfc 

t~2  <  t~2  =  7“ i - 7 - 1  o  \2 - f  TP - fafc  +  Pk)- 

Lk  Lk  1  +  qk  (ak  +  Pfc)2  1  +  qk 

It  is  easy  to  see  that 


l  +  qk- 


(oik  +  Pk)  <  1- 


Next,  we  can  prove  that 


2ak  -  a2k  +  Pi 

(Ok  +  Pk)2 


which  is  equivalent  to 


We  have 


4>(ak,  Pk)  :=  2 ak  -  3 a2k  -  Pk-  4,akpk  <  0. 


=  —2 pk  —  4,ak  <  0. 


Thus,  we  can  consider  the  lower  boundary  of  this  region  which  is  the  upper  boundary 
of  Hi. 

p  =  -q  +  yV  +  (1  -  q)2(2a  -  a2)  +  2q(l  -  q)a 
1  -  q 

Denote  if  (a,  q)  :=  \J  q2  +  (1  —  q)2(  2a  —  a 2)  +  2q(l  —  q)a.  We  have 

4>(ak,Pk)  =  -(q2k+q2k  +  (l-  qk)2(2ak-  a2k) 

+2qk(l  -  qk)ak  -  2qkip(ak,  <?fe))/(l  ~  qk)2 

4 akpk  T  2ak  *^ok 

2qkPk  .  o  2  qkok  2 

=  z. - 4afc/3fe  —  - 2  ak. 

1  ~  qk  1  -  qk 
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It  is  easy  to  see  that 


Hence,  we  have 


and 


2 QkPk  .  a  ^  n 

- - 4  ak/3k  <  0. 

1  ~  Qk 

1  -  qk  2ak  -  a2k  +  ft  <  ^ 

1  +  Qk  (ctfc  +  /3fc)2 

^k-k 2  ^  q 


For  the  case  where  (o^,  /3&)  £  B i,  we  prove  that  the  ratio  is  upper  bounded  by  4. 
The  evolution  of  the  system  is 

(oik,Pk,Qk)  — >  (ctfc+l,  Pk+l,Qk)  iak+2,  Pk+2,  Qk)- 


It  is  easy  to  see  that 


r(?fc)  \  t 
L'k+ 2  —  ^fc+2) 


where  denotes  the  total  error  probability  if  we  use  q k  to  calculate  from  to 
Lk+ 2-  Therefore,  it  suffices  to  prove  that 

-^i+2  —  4L 2  =  afc+2  +  /3fc+2  ~  4 (ctfc  +  /3fc)2  <  0. 


We  have 


&+.  = 

1  +  qk  1  +  <7fc 


From  the  assumption  that  /3k  >  <?,  we  have 


a&±i  =  2o-«) 

a/3fe  1  +  qk  Pk  1  +  qk  ~  P 


It  is  easy  to  get  that 


1  ^  Qk  n2 


Therefore,  we  have 


d/3k+ 2  _  0 1  Qk  a  d/3k+ 1  ,  2  6>/3fc+1  ^  oa 

dPk  l+Qk  dpk  1  +Qk  dpk  ~  Jk' 


Thus, 


d^l+2  ~~  4L| 
d/3k 


<  8(3k  —  8afc  —  8^  <  0. 


Therefore,  we  can  consider  the  lower  boundary  of  B\,  /3k  =  otk-  We  have 

L<»>  -  4ii  = 


(1  +  9fc)3 


(1  +  Qk)2 


2(1  qk)2  2  .  &Qk  2  ^  n 

+  d  +  ®)2  ‘  (TTrfat  “  16“‘  s 

which  holds  in  region  B\.  Hence,  the  ratio  is  upper  bounded  by  4  in  this  region. 


□ 
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Proposition  3.1.4  gives  rise  to  bounds  on  the  change  in  the  total  error  probability 
every  two  steps:  Lk+ 2  <  4 and  Lk+ 2  >  L\/ 2.  From  these,  we  can  derive  bounds 
for  logi^T1  for  even-height  trees,  i.e.,  k  =  log  TV  is  even.  Let  Pn  =  L\og  jv,  namely, 
the  total  error  probability  at  the  fusion  center.  We  will  derive  bounds  for  log  Pfj  . 

Theorem  3.1.1.  Suppose  that  (ao,  PoiQo)  G  R  and  log  TV  is  even.  Then, 

VN  (log  Lq1  -  2)  <  logP^1  <  VN  (log  Lq  1  +  1)  . 

Proof.  If  (a0>  A)>  Qo)  G  R,  then  we  have  {a.k,fik,  Qk)  G  R  for  k  =  0, 1, . . . ,  log  TV  —  2. 

From  Proposition  3.1.4,  we  have 


Lk+ 2 


akL 


2 

k 


for  k  =  0, 1, . . . ,  log  TV— 2  and  some  ak  £  [1/2, 4].  Therefore,  for  k 
we  have 


_  2  2(fc— 2)/2  ofc/2 

^k  ~  a(k-2)/2  '  a(fc-4)/2  '  ’  '  a0  ^0  > 

where  a*  €E  [1/2,4].  Substituting  /c  =  log  TV,  we  have 


=  2, 4,...,  log  TV, 


log  P^1  =  -  loga(fc_2)/2  -  21oga(fc_4)/2  -  . . . 
-  2(fc-2)/2  loga0  +  \/TVlog  Lq1. 


Notice  that  log  L0  1  >  0  and  for  each  i,  —  1  <  log  at  <  2.  Thus, 


log  Pjy1  <  \/TV  log  L0  1  4-  \/TV 
=  \^V(logLo1  +  l). 


Finally, 


logP^1  >  —2VN  +  VN  log  L\ 
=  v/TV(logi0-1-2). 


-l 


□ 


For  odd-height  trees,  we  need  to  calculate  the  decrease  in  the  total  error  probability 
in  a  single  step.  For  this,  we  have  the  following  proposition. 

Proposition  3.1.5.  If  (ah,  Pk,Qk)  G  U,  then  we  have 

Pfc+i 


and 


LI 


Lk+ 1 
Rk 


>  1 


<  1. 
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Proof.  To  prove  Lk+i/Ll  >  1,  it  suffices  to  prove  that 

——^—(2ak  —  al+  Pi  —  (oifc  +  Pk)2) 

f  +  9fc 

+  t — — —  (ctfe  +  Pk  ~  (ctfe  +  /?fe)2)  >  Oj 

l  + 

which  is  easy  to  see. 

To  prove  Lk+i/Lk  <  1,  it  suffices  to  prove  that 

9  (2 ak  -  a2+  Pi  ~  ( ak  +  Pk)) 

k  +  <7  k 

+  z — p - (ctfc  +  Pk  —  (ctfe  +  Pk))  <  0> 

1  +  qk 

which  is  easy  to  see. 


□ 

From  Propositions  3.1.4  and  3.1.5,  we  give  bounds  for  the  total  error  probability  at 
the  fusion  center  for  trees  with  odd  height. 

Theorem  3.1.2.  Suppose  that  (ao,  Po^Qo)  G  R  and  log  N  is  odd.  Then, 

( log^o1  -2)  <  log  Pfr1  <  V2N  (log  Lf1  +  l)  . 

Proof.  The  proof  is  simlar  to  that  of  Theorem  3.1.1  and  it  is  omitted. 

□ 

3.1.4  Asymptotic  rates 

In  this  section,  we  consider  the  asymptotic  decay  rate  of  the  total  error  probability  with 
respect  to  N.  We  compare  the  rate  with  that  of  balanced  binary  relay  trees  without 
sensor  failures. 

Notice  that  when  N  is  very  large,  the  sequence  {(ak,  Pk,Qk)}  enters  the  invariant 
region  R  at  some  level  and  stays  inside  afterward.  Therefore  the  decay  rate  in  the 
invariant  region  determines  the  asymptotic  rate.  Because  our  error  probability  bounds 
for  odd-height  trees  differ  from  those  of  even-height  trees  by  a  constant  term,  without 
lost  of  generality,  we  will  consider  trees  with  even  height  to  calculate  the  decay  rate. 

Proposition  3.1.6.  If  L0  =  ruj  +  Pu  is  fixed,  then 

log  P^  =  Q(VN). 

Proof.  To  analyze  the  asymptotic  rate,  we  may  assume  that  Lq  <  1/2.  In  this  case,  the 
bounds  in  Theorem  3.1.1  show  that 

logp^1  =  e(v'iv). 


□ 
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This  implies  that  the  convergence  of  the  total  error  probability  is  sub-exponential 
with  decay  exponent  y/N.  Compared  to  the  decay  exponent  for  the  case  where  q  = 
0  (no  sensor  failures),  the  asymptotic  rate  does  not  change  when  we  have  crummy 
sensors,  even  though  the  step-wise  shrinkage  for  the  crummy  sensor  case  is  worse. 

Given  Lq  £  (0, 1)  and  e  £  (0, 1),  suppose  that  we  wish  to  determine  how  many 
sensors  we  need  to  have  so  that  /’y  <  e.  The  solution  is  simply  to  find  an  N  (e.g.,  the 
smallest)  satisfying  the  inequality 

V N  (log-Lg  1  —  2)  >  —  loge. 

The  smallest  N  grows  like  0((loge)2)  (cf.,  [33],  in  which  the  growth  rate  is  the  same, 
and  [32],  where  a  looser  bound  was  derived). 


3.2  Communication  Link  Failures 


Next  we  consider  the  detection  performance  of  balanced  binary  relay  trees  with  failure- 
prone  communication  links. 


3.2.1  Problem  formulation 

We  assume  that  all  communication  links  between  nodes  at  height  k  and  height  k  +  1 
have  identical  failure  probability  iy..  As  a  result  of  the  communication  failure,  with 
a  certain  probability  each  node  at  level  k  in  the  tree  does  not  have  any  data,  which 
we  denote  by  p^.  Assuming  equal  prior  probabilities,  we  use  the  likelihood-ratio  test 
[41]  with  unit  threshold  when  fusing  binary  messages  at  the  relay  nodes  and  the  fusion 
center. 


(«*>&.?*) 


4-, 


N  =  2k 


Figure  3.4:  A  balanced  binary  relay  tree  with  height  k.  Circles  represent  sensors  mak¬ 
ing  measurements.  Diamonds  represent  relay  nodes  which  fuse  binary  messages.  The 
rectangle  at  the  root  represents  the  fusion  center  making  an  overall  decision. 
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Consider  the  simple  problem  of  fusing  binary  messages  passed  to  a  node  by  its 
two  immediate  child  nodes.  Assume  that  the  two  child  nodes  have  identical  Type  I 
error  probability  a,  identical  Type  II  error  probability  j3,  and  identical  node  failure 
probability  p.  Moreover,  assume  that  the  two  communication  links  connecting  the  child 
nodes  to  the  parent  node  fail  with  identical  probability  t.  We  can  show  that  for  each 
child  node,  the  parent  node  will  not  receive  any  data  from  it  with  a  certain  probability, 
which  we  denote  by  q  (henceforth,  we  call  this  the  node  failure  probability),  and  given 
by: 

q  =  p+(l-p)t 

Denote  the  Type  I  and  Type  II  error  probability  after  the  fusion  by  a!  and  3' .  The 
probability  that  the  parent  node  does  not  have  any  data  is 

p'  =  (p+  (i  -p)t)2  =  q2 ■ 


If  the  parent  node  receives  data  from  only  one  of  the  child  nodes,  then  the  Type  I 
and  Type  II  error  probabilities  do  not  change  since  the  parent  node  receives  only  one 
binary  message.  The  probability  of  this  event  is  2q(l  —  q),  in  which  case  we  have 

{a',  3')  =  (a,  3). 


If  the  parent  node  receives  messages  from  both  child  nodes,  then  the  scenario  is  the 
same  as  that  in  [32]  and  [33],  The  probability  of  this  event  is  (1  —  q)2,  in  which  case 
we  have 

r  (1  -(l-a)2,32),  a  <3, 

(a',3')=l 

(  (a2, 1  —  (1  —  /3)2),  a  >  3- 

Let  a!  and  3'  be  the  mean  Type  I  and  Type  II  error  probabilities  conditioned  on  the 
event  that  the  parent  node  receives  at  least  one  message  from  its  child  nodes,  i.e.,  the 
parent  node  has  data.  We  have 


(a',3',q)  =  f(a,3,q) 

[  ((1~g)(2i;°3)+2ga,  (1~q\^2qP  ,q2  +  (1  -  q2y)  , 
if  a  <  3, 


=  < 


(d— ' ?)1°ag+29a,  (1-9)(2^f)+2g/V  +  (1  -  q2y)  , 
if  a  >  3- 


Our  assumption  is  that  all  sensors  have  the  same  error  probabilities  (ao,  3oi  <?o)- 
Therefore  by  the  above  recursion,  all  relay  nodes  at  level  1  will  have  the  same  error 
probability  triplet  (ai,3i,qi)  =  /(a0, 3o>  <Zo)  (where  ol\  and  3i  are  the  conditional 
mean  error  probabilities).  Similarly  we  can  calculate  error  probability  triplets  for  nodes 
at  all  other  levels.  We  have 

(ak+1,3k+llqk+i)  =  f(ak,3k,qk),  A:  =  1,2,...,  (3.7) 
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where  (ak,  Bk,  Qk)  is  the  error  probability  triplet  of  nodes  at  the  /c-th  level  of  the  tree. 

The  relation  (3.7)  allows  us  to  consider  (otk,  Bk,  Qk)  as  a  discrete  dynamic  system. 
For  the  case  where  £k  =  0  for  all  k,  we  have  studied  (see  [33])  the  precise  evolution 
of  the  sequence  {(a/.,  /3k)},  derived  total  error  probability  bounds  as  functions  of  N, 
and  established  asymptotic  decay  rates.  In  this  report,  we  study  the  case  where  lk  /  0. 
We  derive  total  error  probability  bounds  and  determine  the  decay  rate  of  the  total  error 
probability. 

We  start  by  looking  at  the  single  trajectory  shown  in  Fig.  3.5(a),  with  the  commu¬ 
nication  failure  probabilities  given  by  ik+ 1  =  We  observe  that  qk  decreases  very 
quickly  to  0.  In  addition,  as  shown  in  Fig.  3.5(b),  the  trajectory  approaches  (3  =  a  at 
the  beginning.  After  {ak,  Bk)  gets  too  close  to  /?  =  a,  the  next  pair  (ak+i,  Bk+i)  will 
be  repelled  toward  the  other  side  of  the  line  /)  =  a.  This  behavior  is  similar  to  the 
non-failure  scenario,  in  which  case  there  exists  an  invariant  region  in  the  sense  that  the 
system  stays  in  the  invariant  region  once  the  system  enters  it  [33].  Is  there  an  invariant 
region  for  the  case  where  g  /  0?  We  answer  this  question  affirmatively  by  precisely 
describing  this  invariant  region  in  R3. 


a 


Figure  3.5:  (a)  A  typical  trajectory  of  (ak,  Bk,  Qk)  in  the  (a,  /3,  q)  coordinates,  (b)  The 
trajectory  in  (a)  projected  onto  the  (a,  B)  plane. 


3.2.2  The  evolution  of  Type  I,  Type  II,  and  node  failure  probabili¬ 
ties 

The  relation  (3.7)  is  symmetric  about  the  hyperplanes  a  +  B  =  1  and  /3  =  a.  Thus, 
it  suffices  to  study  the  evolution  of  the  dynamic  system  only  in  the  region  bounded  by 
a  +  B  <  1,  B  >  a,  and  0  <  q  <  1.  Let 

U  :=  {(a,  B)  >  0|a  +  /3  <  1,  /3  >  a,  and  0  <  q  <  1} 

be  this  triangular  prism.  Similarly,  define  the  complementary  triangular  prism 

C  :=  {(a,  B)  >  0|a  +  <  1,  /?  <  a,  and  0  <  q  <  1}. 
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First,  we  introduce  the  following  region: 

Bi  :=  {(a,B,q)  G  U\/3  <  -q/{  1  -  q)  +  sj  q2  +  (1  -  q)2{ 2a  -  a2)  +  2q{l  -  q)a/ (1  -  <?)}. 

If  (ak,  Bk ,  qk)  G  Bi,  then  the  next  triplet  (ak+\,  Bk+i,  Qk+i)  jumps  across  the  plane 
B  =  a  away  from  (a/.,  Bk,  Qk)-  More  precisely,  if  ( ak,Bk,Qk )  G  U,  then  ( ak,Bk,Qk )  G 
Bi  if  and  only  if  (otk+ i,  Bk+i,  Qk+i)  G  C.  In  other  words,  B\  is  the  inverse  image  of 
C  in  U  under  mapping  /. 

It  is  easy  to  see  if  we  start  with  (ao,  Bo,  Qo)  G  W\Bi,  then  before  the  system  enters 
B i,  we  have  ctk+i  >  atk  and  Bk+i  <  Bk-  Thus,  the  system  moves  towards  the  B  —  a 
plane.  Therefore,  if  the  number  of  sensors  N  is  sufficiently  large,  then  the  system  is 
guaranteed  to  enter  B\. 

Next  we  consider  the  behavior  of  the  system  after  it  enters  B\.  If  (ak,  Bk,  qk)  G  B\, 
we  consider  the  position  of  the  next  pair  (a^+i,  Bk+i,  Qk+ 1)>  i-e->  consider  the  image 
of  B  \  under  /,  denoted  by  lie  ■  Similarly  we  denote  the  reflection  of  Rc  with  respect 
to  B  —  a  by  Ru-  We  find  that 

Ru  ■=  {( a,B,q )  £U\B  <  -a  +  2(\J  q2  +  (1  -  q2)a  -  q)/(  1  -  q)}. 

The  sets  Ru  and  B\  have  some  interesting  properties.  We  denote  the  projection  of 
the  upper  boundary  of  Ru  and  Bi  onto  the  (a,  B)  plane  for  a  fixed  q  by  RB^  and  B\, 
respectively.  It  is  easy  to  see  that  if  <71  <  q-i,  then  f?^1  lies  above  f?®  in  the  (a,B) 
plane.  Similarly,  if  qi  <  (72.  then  Bf1  lies  above  Bf2  in  the  (a,  B)  plane.  Moreover, 
we  have  the  following  proposition. 

Proposition  3.2.1.  B\  c  Ru- 

Proof  The  proof  is  similar  to  that  of  Proposition  3.1.1  and  it  is  omitted. 


□ 


We  denote  the  region  Hu  U  He  by  R.  We  show  below  that  R  is  an  invariant  region 
in  the  sense  that  once  the  system  enters  R,  it  stays  there. 

Proposition  3.2.2.  If  (ak0,  Bk0,  Qk0)  G  R  for  some  ko  and  {qk}  decreases  monotoni- 
cally  for  k  >  ko,  then  (ak,  Bk,  Qk)  G  Rfor  all  k  >  ko- 


From  the  above  proposition,  we  can  study  the  reduction  of  the  total  error  probability 
when  the  system  lies  in  R  to  determine  the  asymptotic  decay  rate. 


First,  we  compare  the  step-wise  reduction  of  the  total  error  probability  between 
the  case  where  communication  links  fail  with  certain  probabilities  (failure  case)  and 
the  case  where  the  network  has  no  communication  link  failures  (non-failure  case).  We 
show  that  if  the  communication  links  are  unreliable,  then  the  decay  of  the  total  error 
probability  for  a  single  step  is  slower  than  the  non-failure  case. 


Proposition  3.2.3.  Let  L[%  be  (twice)  the  total  error  probability  at  the  next  level 
from  the  current  state  (ak,  Bk,q)-  Suppose  that  (ak,  Bk,Qi)  and  (ak,  Bk,Q2)  G  U.  If 


qi  <  92,  then 


rto)  < 
Lk+ 1  — 


(9  2) 
fc+1 


with  equality  if  and  only  if  ak  =  Bk- 
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Proof.  From  the  recursion,  we  have 


(q)  _  i-'M o) 

Jk+ 1 


1  +  9 


fc+i 


2  9 

1  +  9 


(oik  +  Pk), 


where  4°+  r  =  2afc  -  a2 +P2. 

It  is  easy  to  show  that  2au  —  ak  +  pk  <  ak  +  /3fc. 


2ctfc  —  +  /3^  <  afe  +  /3fc 

+  Pk  Pk- 


Since  au  +  Pk  <  1  and  Pk  >  we  have  Pk  —  1/2  <  1/2  —  ak ■  Notice  that  the 
function  x  —  x2  peaks  at  x  =  1/2.  Hence,  2a^  —  af,  +  P%  <  ak  +  Pk  with  equality  if 
and  only  if  ak  =  Pk- 


Notice  that 


Therefore  we  can  write 


1-9 
1  +  9 


2  9 

1  +  9 


=  1. 


—  7ri+i<+i  +  (1  -  7ri)(«fc  +  Pk), 

where  7Ti  =  (1  —  9i)/(l  +  9i),  Let  7T2  =  (1  —  92)/(l  +  92),  it  is  easy  to  see  that 
7Ti  >  7T2-  Thus  we  have 

+i+}  =  7ri+i°ji  +  (1  -  7Ti)(afc  +  Pk) 

+  (tt2  -  7 -  (tt2  - 
<  TTi^^i  +  (1  -  7Ti)(afc  +  /3fc) 

+  (tt2  -  7Tl)Z/^/^1  -  (7T2  -  7Ti)(afc  +  /3fc) 


=  L 


(9  2) 
fc+1- 


□ 


From  Proposition  3.2.3,  we  immediately  deduce  that  if  q  >  0,  then 


L 


(0) 


fc+i  — 


<  L 


(9) 

fc+i’ 


which  means  that  the  decay  of  the  total  error  probability  for  a  single  step  is  the  fastest 
if  the  failure  probability  is  0  (i.e.,  the  non-failure  case).  In  other  words,  for  the  failure 
case,  the  step-wise  shrinkage  of  the  total  error  probability  cannot  be  faster  than  the 
non-failure  case,  where  the  total  error  probability  decays  to  0  with  exponent  \PN  [33], 


Next  we  assume  that  the  communication  failure  probabilities  are  identical  at  all 
levels,  that  is,  £k  =  C  for  all  k ,  where  C  £  (0, 1).  Denote  Lk  =  ak  +  Pk  to  be  (twice) 
the  total  error  probability  for  nodes  at  level  k.  Let  Pn  =  Liog  jv,  which  is  (twice)  the 
total  error  probability  at  the  fusion  center.  We  provide  an  upper  bound  for  log  Pf  . 

Theorem  3.2.1.  Suppose  that  (ao>  Po,  9o)  €E  R  and  ik  =  C  for  all  k,  where  C  £  (0, 1). 
Then, 

\ogP~1  <VN  (\ogL~l  +  1)  . 
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Theorem  3.2.1  provides  an  upper  bound  for  log  PN 1 .  Moreover,  we  can  show  that 
in  the  asymptotic  regime, 

log-P^1  =  o(VN). 

This  implies  that  the  convergence  rate  is  strictly  slower  than  v  N  (note  that  the  conver¬ 
gence  rate  for  the  non-failure  case  is  exactly  \/7V). 


Finally,  we  assume  that  the  failure  probabilities  decay  quadratically  to  0,  that  is, 
£k+i  =  f'jj,  where  k  =  0, 1, ... ,  log  N  —  1.  In  consequence,  if  ao ,  /?o.  and  qo  are  fixed, 
then  we  have  qk  <  au  and  q k  <  /3k  for  sufficiently  large  k.  With  these,  we  derive 
upper  and  lower  bounds  for  log  Pf1- 

Proposition  3.2.4.  Suppose  that  (ak,  Pk,qk)  G  R,  otk>  qk,  and  (3k  >  qk-  Then, 


1  ^  Lk+ 2 

2  " 


<  4. 


Proof.  The  proof  is  similar  to  that  of  Proposition  3.1.4  and  it  is  ommitted. 


□ 

Proposition  3.2.4  gives  rise  to  bounds  on  the  change  in  the  total  error  probability 
every  two  steps:  L*.+ 2  <  4 P2  and  Lk+ 2  >  2.  From  these,  we  can  derive  bounds 

for  log  Pf 1  for  even-height  trees,  i.e.,  k  =  log  N  is  even. 

Theorem  3.2.2.  Suppose  that  (cto,  Ad,  Qo)  G  R  and  ik+i  —  where  k  =  0, 1, ... ,  log  N— 
1.  If  log  N  is  even,  then 

VN  (log  Lf1  -  2)  <  logP^1  <  v/iV  (logLo  1  +  1)  . 

Proof.  If  (ao,  A),  Qo)  G  R,  then  we  have  (ak,Pk,  qk)  G  R  for  k  =  0, 1, . . . ,  log  TV  —  2. 

From  Proposition  3.2.4,  we  have 


Lk+ 2  —  Qfcpj; 


for  k  =  0, 1, . . . ,  log  N—  2  and  some  a*,  G  [1/2,4],  Therefore,  for  k  =  2, 4, . . . ,  log  TV, 
we  have 

_  2  2(fc  — 2)/2  2fc 

Tfc  —  <Z(fc-2)/2  '  a(k-4)/2  ■  ■  ■  a0  T 0 

where  £  [1/2,4],  Substituting  k  =  log  iV,  we  have 


„fc/2 


iog-P/v1  =  -  loga(fc_2)/2  -  21oga(fc_4)/2  -  . . . 
-  2*fc_2)/2  loga0  +  %/lVlog  Lq 

Notice  that  log  /./ 1  >  0  and  for  each  i,  —1  <  log  a,  <  2.  Thus, 

logP^1  <  v/fVlog  Lq1  +  %/iV 
=  (log  Lf1  +  l)  . 
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Finally, 

logP^1  >  — 2v/]V  +  %/TVlogfy1 
=  '/N  (lOgLg  1  —  2)  . 


□ 


For  odd-height  trees,  we  need  to  calculate  the  decrease  in  the  total  error  probability 
in  a  single  step.  For  this,  we  have  the  following  proposition. 

Proposition  3.2.5.  If{ctk,  ft k,qk)  &  U,  then  we  have 

Lk+i 


LI 


>  1 


and 


Lk+ 1 

Lk. 


<  1. 


From  Propositions  3.2.4  and  3.2.5,  we  give  bounds  for  the  total  error  probability  at 
the  fusion  center  for  trees  with  odd  height. 

Theorem  3.2.3.  Suppose  that  (ao,  fio,  Qo)  €=  R  and  lk+i  =  y,  where  k  =  0, 1, ... ,  log  N— 
1.  If  log  N  is  odd,  then 

(loSLo  1  -  2)  <  log  PN  <  V2N  (log  Lf1  +  l)  . 


We  have  derived  error  probability  bounds  for  balanced  binary  relay  trees  with  un¬ 
reliable  communication  links.  In  the  next  section,  we  will  use  these  bounds  to  study 
the  asymptotic  rate  of  convergence. 


3.2.3  Asymptotic  rates 

Notice  that  when  N  is  very  large  and  { qk }  decreases  monotonically,  the  sequence 
{(crfc,  /3fc,  <Zfc)}  enters  the  invariant  region  R  at  some  level  and  stays  inside  afterward. 
In  consequence,  the  decay  rate  in  the  invariant  region  determines  the  asymptotic  rate. 
Since  the  error  probability  bounds  for  odd-height  trees  differ  from  those  of  even-height 
trees  simply  in  a  constant  term,  without  loss  of  generality,  we  will  consider  trees  with 
even  height  to  calculate  the  decay  rate. 

Proposition  3.2.6.  Suppose  that  L0  =  a0  +  Po  is  fixed  and  {qk}  decreases  monotoni¬ 
cally.  If  qu  <  oik  and  qk  <  fik  for  sufficiently  large  k,  then 

log  P^  =  Q{VN). 

This  implies  that  the  convergence  of  the  total  error  probability  is  sub-exponential 
with  exponent  \/A\  Compared  to  the  exponent  for  the  non-failure  case,  the  scaling 
law  of  the  asymptotic  rate  does  not  change  when  we  have  unreliable  communications. 
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provided  the  probabilities  of  communication  failure  probabilities  decay  to  0  sufficiently 
fast,  even  though  the  step-wise  shrinkage  for  the  failure  case  is  worse. 

Given  Lq  £  (0, 1)  and  e  £  (0, 1),  suppose  that  we  wish  to  determine  how  many 
sensors  we  need  to  have  so  that  Pjy  <  e.  The  solution  is  simply  to  find  an  N  (e.g.,  the 
smallest)  satisfying  the  inequality 

V N  (log  To  1  -  2)  >  -  log  e. 

The  smallest  N  grows  like  0((log£)2)  (cf.,  [33],  in  which  the  growth  rate  is  the  same, 
and  [32],  where  a  looser  bound  was  derived). 
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Chapter  4 


Submodularity  and  Optimality 
of  Fusion  Rules 

4.1  Problem  Formulation 


Consider  a  binary  hypothesis  testing  problem  in  a  balanced  binary  relay  tree  with  height 
h.  The  leaf  nodes,  depicted  as  circles  in  Fig.  5.1,  are  identical  sensors  making  inde¬ 
pendent  measurements.  Then  the  measurements  are  compressed  into  binary  messages 
and  forwarded  to  their  parent  nodes.  Consider  a  non-leaf  node  p.  We  denote  by  C(p) 
the  immediate  child  nodes  of  p,  which  receives  two  binary  messages  Yc  £  {0, 1}  for 
c  £  C(p).  Then  p  summarizes  the  two  binary  messages  into  a  new  binary  message 
Yp  £  {0, 1}  using  fusion  rule  Ap: 

Yp  =  \P({Yc:c£C(p)}). 

The  new  message  Yp  is  then  communicated  to  the  parent  node  of  p.  Ultimately,  the 
fusion  center  generates  an  overall  binary  decision.  In  balanced  binary  relay  tree  with 
binary  message  alphabet,  we  already  know  that  the  majority  rule  with  random  tie¬ 
breaking  does  not  change  the  Type  I  and  II  error  probabilities  [37].  In  consequence, 
the  only  meaningful  rules  to  aggregate  two  binary  messages  in  this  case  are  simply 
‘AND’  and  ‘OR’  rules  defined  as  follows: 

•  AND  rule  (denote  by  A):  the  parent  node  decides  1  if  and  only  if  both  the  child 
nodes  send  1; 

•  OR  rule  (denote  by  O):  the  parent  node  decides  0  if  and  only  if  both  the  child 
nodes  send  0. 

Notice  that  ULRT  is  either  the  A  rule  or  the  O  rule,  depending  on  the  values  of  the 
Type  I  and  Type  II  error  probabilities  at  a  particular  level  of  the  tree.  Henceforth,  we 
choose  all  fusion  rules  in  the  tree  from  y  =  {A,  O}. 

We  assume  that  all  sensors  are  identical  and  independent  in  this  balanced  config¬ 
uration,  and  that  all  the  nodes  at  level  k  use  the  same  fusion  rule  A&;  i.e.,  Ap  =  A/i: 
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for  all  every  node  p  at  the  fc-th  level.  In  this  case,  we  have  shown  in  [33]  that  all  the 
nodes  at  level  k  have  the  same  Type  I  (false  alarm)  and  Type  II  (missed  detection)  error 
probabilities,  which  we  denote  by  au  and  P};  respectively. 

We  denote  by  irh  =  (Ai,  A2, . . . ,  Xh)  a  fusion  strategy,  where  Xj  £  y  denotes  the 
fusion  rule  for  nodes  at  each  level  j.  Let  the  collection  of  all  possible  fusion  strategies 
7 rh  be  yh,  which  can  be  written  as 

yh  =  yxyx...xy. 

V - V - ' 

h  times 

Note  that  for  given  initial  error  probability  pair  (ao,  /?o)-  the  pair  (ah,  Ph)  depends  on 
the  strategy  7 t  . 


Figure  4.1:  A  balanced  binary  relay  tree  with  height  h.  Circles  represent  sensors  mak¬ 
ing  measurements.  Diamonds  represent  relay  nodes  which  fuse  binary  messages.  The 
rectangle  at  the  root  represents  the  fusion  center  making  an  overall  decision. 

For  simplicity,  we  assume  that  the  prior  probabilities  of  the  two  hypotheses  are 
equal.  The  global  objective  is  to  minimize  the  total  error  probability  at  the  fusion 
center  for  given  initial  error  probability  pair  (ao,  Po),  namely,  to  maximize  (twice)  the 
reduction  of  the  total  error  probability  after  all  the  fusions.  We  call  this  optimization 
problem  a  /(-optimal  problem,  which  is  defined  as  follows: 

v*h(a0,  p0)  =  max  (a0  +  Po  ~  (ah  +  Ph )) 

■rrheyh 

h- 1 

=  max  22(aj  +  Pj  ~  (aj+ 1  +  Pj+i))- 

TThGyh  z— ; ( 

7=0 

The  //-optimal  fusion  strategy  is  defined  as 

nh*(a0,  Pq)  =  arg  max(a0  +  Po-  (ah+  Ph)) 

7 vheyh 

h- 1 

=  arg  max  Y"  (a,  +  pj  -  (a^+i  +  Pj+i))- 
7T heyh  j=0 
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On  the  other  hand,  ULRT  is  the  1 -optimal  fusion  rule,  which  maximizes  the  level- 
wise  reduction  of  the  total  error  probability: 

ULRT  =  arg max(a,  +  ft  -  (ai+1  +  A+i))  Vi. 

Key 

Note  that  in  this  case,  a  maximum  a  posteriori  fusion  rule  is  the  same  to  ULRT.  This 
fusion  rule  is  also  known  as  the  greedy  rule  in  many  literatures.  In  this  context,  we  call 
a  fusion  strategy  ULRT  strategy  if  each  fusion  rule  of  the  strategy  is  a  ULRT. 

In  the  next  section,  we  derive  the  explicit  fusion  strategy  for  balanced  binary  relay 
tree  with  height  h.  We  then  show  that  the  2-optimal  strategy  is  essentially  equivalent  to 
the  ULRT  strategy.  Moreover,  we  show  that  the  reduction  of  the  total  error  probability 
is  a  submodular  function,  which  implies  that  the  ULRT  strategy  is  close  to  optimal 
fusion  strategy. 


4.2  Main  Results 

4.2.1  A  dynamic  programming  formulation 

In  this  section,  we  formulation  the  problem  using  a  deterministic  dynamic  program¬ 
ming  model.  First  we  define  the  necessary  elements  of  this  model. 

•  Dynamic  System:  We  define  the  error  probability  pair  at  fc-th  level  s^  =  ( oik .  /3k ) 
to  be  the  system  state.  Notice  that  a k  and  pk  can  only  take  values  in  the  interval 
[0, 1].  Therefore,  the  set  of  all  the  states  is  [0, 1]  x  [0, 1].  Moreover,  given  the 
fusion  rule,  the  state  transition  function  is  deterministic.  For  example,  letting 

Sk-i  =  (afc_i,  Pk-i),  if  we  choose  Afc  =  A ,  then 

( Oik ,  Pk)  ■=  f(aik- 1,  Pk-i)  =  (!-(!-  l)2,  Pk-i)- 

On  the  other  hand,  if  we  choose  Xk  =  O,  then 

(/3fc,«fc)  =  f(/3k-i,Uk-i)  =  (1  -  (1  -  /3fc-i)2,a2._i)- 

•  Rewards:  At  each  level  k,  we  define  the  instantaneous  reward  to  be  the  reduction 
of  the  total  error  probability  after  fusing  with  A/{;: 

r(sk~i,  Afc)  =  {ak- 1  +  Pk-i)  ^  («fc  +Pk), 

where  ak  and  6k  are  functions  of  the  previous  state  sk-  \  and  the  fusion  rule  Xk. 

Let  Vh-k(s )  denote  the  cumulative  reduction  of  the  total  error  probability  if  we 
start  the  system  sk  at  level  k  and  the  strategy  (Afc+i,  Afc+2  . . . ,  A h)  is  used.  Following 
the  above  definitions,  we  have 


h 

Vh-k(s)  =  ^  ^  Aj)|Sfc=s. 

j=k+ 1 
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If  we  let  k  =  0,  that  is,  we  start  calculating  the  reduction  from  the  sensor  level,  then  the 
above  cumulative  reward  function  is  the  same  as  the  global  objective  function  defined 
in  Section  II.  In  consequence,  for  given  initial  state  so,  we  have  to  solve  the  following 
optimization  problem  to  find  the  global  optimal  strategy  over  horizon  h: 

h 

vh(s o)  =  max  Vr(sJ_i,Aj)|So. 

'jTh-£yn  — , 

3  = 1 

I  5fC 

The  globally  optimal  strategy  ir  ,  which  is  a  combination  of  fusion  rules  for  all  levels, 
can  be  written  as 

h 

nh*(s0)  =  argmax  V'r(sJ_i,  Aj)|So. 

Trheyh  J=1 

Notice  that  the  state  at  the  fc-level  sk  depends  on  the  previous  state  Sk-i  and  the  fusion 
rule  A fc.  Hence  we  write  the  state  at  level  k  to  be  t\k.  The  solution  of  the  above 

optimization  problem  can  be  characterized  using  Bellmam ’s  equation ,  which  states  that 

v*h{s0)  =  max  [r(s0,  Ai)  +  t4-i(siL,Ai)]  • 

Ai  Ey 


Moreover, 

Aj(s0)  =  argmax  [r(s0,  Ax)  +  <_i(siU0,Ai)l 
Aiey 

is  the  first  element  of  the  optimal  strategy  7t*.  Recursively,  the  solution  of  the  opti¬ 
mization  problem  is 

K-(k- i)(sk-i)  =  max  [r(sfc_i,Afc)  +  v*h_k(sk\Sk_lM)]  . 

Afc  t  y 

Moreover,  the  optimal  fusion  rule  at  level  k  is 


Afc(sfc-i)  =  argmax  [r(sfc_i,  Afc)  +  v*h_k(sk\Sk^ltxk)]  ■ 

The  explicit  solution  of  the  above  set  of  equations  requires  dynamic  programming, 
and  the  computational  complexity  grows  exponentially  with  respect  to  the  horizon. 


4.2.2  2-optimal  fusion  strategy 

In  this  section,  we  show  that  the  2-optimal  fusion  strategy  is  equivalent  to  the  ULRT 
strategy.  However,  ULRT  is  not  fc-optimal  for  k  >  2  in  general.  First  consider  the 
2-optimal  problem;  i.e.,  k  =  2. 


2 

<4(s o)  =  max  Vr(si_ i,Aj) | So, 

7r^ey~  — , 

3  = 1 

where  y 2  =  {(Al,  A).,  ( A ,  O),  (O,  O),  ( O ,  Al)}.  We  have  the  following  theorem. 
Theorem  4.2.1.  -k2  is  a  2-optimal  fusion  strategy  if  and  only  if  if2  is  a  ULRT  strategy. 
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Proof.  Suppose  that  the  initial  state  is  (cto,  /Jo)-  The  ULRT  fusion  rule  reduces  to  A 
and  O  in  the  following  way: 

•  ULRT=„4  if  3k  >  o.k, 

•  ULRT=C>  if  3k  <  ak. 

It  is  easy  to  show  that  the  total  error  probability  decreases  after  fusing  with  ULRT. 
However,  if  we  apply  a  fusion  rule  other  than  ULRT,  then  we  can  show  that  the  total 
error  probability  increases  after  fusion.  For  example,  if  3k  >  ctk  and  we  apply  the  O 
fusion  rule,  then  the  detection  error  probability  increases: 

Ctfc+l  +  3k+ 1  =  OT  +  1  —  (1  —  3k)~  3  OLk  +  3k, 

In  other  words,  the  instantaneous  reward  is  non-positive, 

r{sk,0)  <  0. 


Moreover  because  of  symmetry,  it  suffices  to  prove  this  theorem  in  the  upper  trian¬ 
gular  region  U  =  {(a,  3)  >  0|a  +  /3  <  1  and  /3  >  a}  (see  Fig.  4.2).  Also  recall  that 
if  (ak,  3k)  G  B\,  where  B\  :=  {(a,  0)  £  IA\(1  —  a)2  +  /32  <  1}  (see  Fig.  4.2),  then 
the  next  state  (ak+i,  3k+i)  G  £■  Therefore,  we  divide  our  analysis  into  two  parts: 

•  Case  /:  (op;  3o )  G  B i,  in  which  case  the  ULRT  strategy  is  (A.  O)', 

•  Case  II:  (etch  Po)  GM  \  B  i,  in  which  case  the  ULRT  fusion  strategy  is  (A,  A). 

For  Case  I  where  (ao>  3o )  G  B\,  it  is  easy  to  see  that  strategy  (A,  O)  achieves  a 
larger  reduction  than  that  of  (.4,  A),  because  using  A  rule  for  the  second  level  increases 
the  total  error  probability.  Moreover,  the  total  error  probability  after  using  (0,0) 
increases  with  respect  to  the  initial  one.  Hence,  this  fusion  rule  is  excluded.  It  suffices 
to  show  that  the  strategy  (^4,  O)  achieves  larger  reduction  than  that  of  (O,  A): 

r(s0,.A)  +  r(si,0)  >  r(s0,O)  +r(si,A), 
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which  is  equivalent  with  the  following  inequality 


r(«o,4)  +  r(s!,0)  -  ( r(s0,O )  +  r(si,4))  = 

(l-(l^/30)2)2  +  l-(l-a2)2- 
((l-(l-ao)2)2  +  l-(l-/32)2)>0. 

The  above  inequality  can  be  reduced  to 

Poi1  ~  A,)2  -  ag(l  -  a0)2  >  0, 
which  holds  for  all  (ao,  fio)  £  B\. 

For  Case  II  where  (ao,  fio)  €  U\B\,  it  is  easy  to  see  that  strategy  (A,  A)  achieves  a 
larger  reduction  than  that  of  (A,  O),  because  using  O  rule  for  the  second  level  increases 
the  total  error  probability.  Moreover,  the  total  error  probability  after  using  (0,0) 
increases  with  respect  to  the  initial  one.  Hence,  this  fusion  rule  is  excluded.  It  suffices 
to  show  that  the  strategy  (A,  A)  achieves  larger  reduction  than  that  of  (O,  .4): 

r(s0,A)  +r(si,4)  >  r(s0,O)  +  r(si,4), 


which  reduces  to 

r(s0,4)  +  r(si|So,4)  -  (r(so,0)  +  r(si|So,4))  = 
(l-(l-/30)2)2  +  l-(l-a2)2- 
(l-(l-a0)4  +/304)  >0. 

The  above  inequality  is  equivalent  to 

A)(l  —  /3o)(l  +  Ad)  ~  «o(l  —  «o)(l  —  ao)  >  0, 
which  holds  for  all  (ao,  Ad)  G  U  \  B\. 


□ 

We  have  shown  that  the  ULRT  strategy  which  maximizes  the  step-wise  reduction 
in  the  total  error  probability  is  also  a  2-optimal  fusion  strategy.  However,  the  ULRT 
strategy  is  not  in  general  optimal  for  multiple  levels;  i.e.,  h  >  2.  Next  we  provide  a 
numerical  example  that  shows  that  the  ULRT  strategy  is  not  3-optimal.  Let  the  initial 
state  be  (ao,  /?o)  =  (0-2,  0.3).  The  ULRT  strategy  in  this  case  is  ( A ,  O ,  A).  As  shown 
in  Fig.  4.3,  the  red  line  denotes  the  total  error  probabilities  at  each  level  up  to  3.  How¬ 
ever,  the  3-optimal  strategy  in  this  case  is  (O,  A,  A).  The  total  error  probability  curve 
of  this  strategy  is  shown  as  a  green  dashed-line  in  Fig.  4.3. 


4.2.3  Submodularity 

Consider  a  balanced  binary  relay  tree  with  height  2 h.  We  assume  that  two  fusion  rules 
A  of  consecutive  levels  are  choosing  from  the  following  set  Z  =  {(4,  O),  (0,4)}. 
Let  n  =  (Ai ,  A2, . . . ,  A h)  be  the  overall  fusion  strategy,  where  A,.  £  Z.  In  this  case, 
the  reduction  of  the  total  error  probability  is 

Wi(Ai,  A2, . . . ,  Ah)  =  ao  +  Po  —  (a2h  +  P2 h)- 
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Figure  4.3:  Comparison  of  the  ULRT  strategy  and  the  3-optimal  strategy. 

The  global  optimization  problem  is  to  select  A i  £  Z  such  that  the  above  reduction  is 
maximized,  that  is 

u*h  =  max  uh{ Ai,  A2, . . . ,  Ah). 
nezh 

Next  we  show  an  important  property  of  the  reduction  in  the  total  error  probability. 
Proposition  1:  Uh(Ai,  A2, . . . ,  A/J:  Zh  — >  K  is  a  submodular  function. 

Proof.  We  show  the  ‘diminishing  return’  property  of  the  above  function  Uh,  that  is, 

t^m+l(Ai,  A2,  •  •  •  ,  Am,  A  )  Mm( Ai,  A2,  •  ■  •  ,  A m)  > 

-^-2,  *  ■  *  j  *  *  •  >  An,  A  ) 
tin  (At :  A  2 )  *  •  •  )  Am ,  .  .  .  ,  An), 
where  A i  €  Z  for  all  i  and  A*  £  Z. 

We  first  prove  the  simplest  case  where  m  =  0  and  n  =  1,  that  is. 

Ml  (A*)  -  Mo (0)  >  M2(A1,  A*)  -  Mi(Ai), 

for  all  Ai,A*  £  Z.  We  know  that  Mo(0)  =  0.  Because  of  symmetry,  it  suffices 
to  show  the  above  inequality  for  the  cases  where  (Ai,A*)  =  ((A,0):  (A,0))  and 
(Al7  A*)  =  ((.4, 0 ),  (0,A)).  For  example,  if  (Ai,  A*)  =  (( A ,  O),  ( A ,  O)),  then  we 
can  show  the  following 

ak+4  —  QJfe+2  —  (afc+2  —  ctfc)  =  (4.1) 

ttfe+4  +  ctfe  —  2afe+2  =  (4.2) 

-  al6  +  8ai4  -  24 af  +  32a£°  (4.3) 

—  14a®  —  8a®  +  lOaf  —  4aj(  +  a*,  >  0,  (4.4) 
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which  holds  for  sufficiently  small  Type  I  error  probability.  Therefore,  the  inequality 
for  the  simplest  case  holds.  The  readers  are  referred  to  Fig.  4.4  for  a  plot  of  the  value 
in  Eqn.  (2)  with  respect  to  ak- 

From  this  case,  it  is  easy  to  show  that 

ttm+l(Ai,  A2,  .  .  .  ,  Am,  A  )  Um(Ai,  A2,  •  ■  - ,  A m)  ^ 

tt-m+2 (Ai ,  A2, . . . ,  Am,  Am_)_x ,  A  ) 

—  Um+ 1  (Ai .  A 2*  *  ...  A  m  ?  Am-f-i), 


and 


Mm+2(Ai,  A2, . . . ,  Am+1,  A*)  —  um+1(A1,  A2, . . . ,  Am+1)  > 
t^m+3 ( Ai ,  A2 ,  .  .  .  ,  Am+1,  Ato+2,  A*) 

—  wm+2(Ai,  A2, . . . ,  Am+i,  Am+2), 

where  A ^  £  Z  for  all  /  and  A*  £  Z.  The  main  result  is  easy  to  show  simply  by 
mathematical  induction. 


Figure  4.4:  Values  of  ak+i  —  ctfc+2  —  (otk+2  —  ak)  versus  ak- 

Next  we  show  that  the  function  Uh( A^  A2, . . . ,  A/>)  is  a  non-decreasing  function. 
It  suffices  to  show  the  following: 

«i(A*)  >  0, 

for  all  A*  £  Z.  For  example,  if  A*  =  (A.  O),  then 

ui  (A* )  =  ak+pk  -  (1  -  (1  -  ak)2)2  -  (1  -  (1  -  /?fc)2)  >  0, 

which  holds  if  and  only  if  the  Type  I  and  II  error  probabilities  are  sufficiently  small. 

□ 
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We  have  shown  that  the  reduction  of  the  total  error  probability  is  a  submodular 
function.  Moreover,  we  know  that  the  total  error  probability  does  not  change  if  there  is 
no  fusion,  that  is, 

wO(0)  =  0. 

Therefore,  we  can  employ  some  well-known  results  about  the  optimality  of  greedy 
algorithms  on  submodular  functions.  First,  it  has  been  shown  that  the  greedy  algorithm 
provides  a  constant-factor  approximation  for  the  submodular  problems. 

Lemma  1 :  (Nemhauser  et  al.,  [1978])  For  a  monotonic  submodular  function  F  with 
F(0)  =  0,  the  greedy  strategy  S9  achieves  at  least  a  constant  factor  of  the  maximum 
value  F*  (, S )  obtained  by  the  globally  optimal  strategy;  i.e., 

F(S9)  >  (l-e^)F*(S). 

Proof.  See  [42]  for  the  proof. 


□ 

We  denote  ii  '  to  be  the  reduction  of  the  total  error  probability  when  using  ULRT 
as  the  fusion  rules  for  all  levels  in  balanced  binary  relay  trees.  After  applying  Lemma 
1  to  our  problem,  we  get  the  following  theorem. 

Theorem  4.2.2.  Consider  a  balanced  binary  relay  tree  with  height  2 h.  We  have 

(1  -  e_1X  <  <  <  u*h. 

Proof.  The  inequality  on  the  right  hand  side  holds  simply  because  tif  is  the  maximum 
reduction  of  the  total  error  probability;  i.e., 

uh  <  uh. 

We  have  shown  that  Uh  is  a  submodular  function  with  ito(0)  =  0.  Therefore,  we  can 
simply  apply  the  Lemma  1  to  this  problem. 

(1  -  e^)u*h  <  vf. 


□ 

Theorem  2  tells  that  the  ULRT  strategy  is  essentially  close  to  the  overall  optimal 
strategy.  However,  recall  that  the  fusion  strategy  is  a  collection  of  fusion  rules  from 
Z  =  {(A.  O),  ( O ,  .4)}.  Thus,  the  strategies  we  considered  in  this  section  have  at  most 
two  consecutive  repeated  fusion  rules.  For  example,  the  strategy  (A.  A.  A. . . .)  is  not 
considered. 
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Chapter  5 


M- ary  Relay  Trees 


In  this  chapter,  we  use  the  above  approach  to  study  the  detection  performance  of  M- ary 
relay  trees.  In  contrast  to  the  results  in  [36],  which  only  address  the  asymptotic  regime, 
we  derive  tight  upper  and  lower  bounds  for  the  Type  I  and  II  error  probabilities  at  the 
fusion  center  as  explicit  functions  of  N.  We  show  that  the  majority  dominance  rule  is 
essentially  sub-optimal  in  the  case  where  M  is  even.  Specifically,  our  result  shows  that 
for  all  M, 

log  2P^  =  0{N 1°smL1^J)) 

a  result  not  present  in  [36]. 


5.1  Problem  Formulation 


We  consider  the  problem  of  binary  hypothesis  testing  between  Hq  and  II  \  in  an  M- 
ary  relay  tree.  Let  Po  and  Pi  be  the  probability  measures  associated  with  the  binary 
hypotheses.  As  shown  in  Fig.  5.1,  leaf  nodes  are  sensors  undertaking  initial  and  in¬ 
dependent  measurements  of  the  same  event.  Only  the  leaves  are  sensors  making  mea¬ 
surements  in  this  tree  architecture.  These  measurements  are  compressed  into  binary 
messages  and  forwarded  to  the  parent  nodes  at  the  next  level.  Each  non-leaf  node  with 
the  exception  of  the  root,  the  fusion  center,  is  a  relay  node,  which  combines  M  binary 
messages  into  one  new  binary  message  and  forwards  the  new  binary  message  to  its 
parent  node.  This  process  takes  place  at  each  node,  culminating  at  the  fusion  center 
at  which  the  final  decision  is  made  based  on  the  information  received.  The  height  of 
the  tree  is  logM  N,  which  grows  as  the  number  of  sensors  increases.  Evidently,  for 
M  =  2,  the  structure  is  simply  a  balanced  binary  relay  tree,  which  is  the  worst-case 
scenario  in  the  sense  of  largest  tree  height  among  M -ary  relay  trees. 

We  assume  that  all  sensors  are  independent  given  each  hypothesis,  and  that  all 
sensors  have  identical  Type  I  error  probability  (denoted  by  o0)  and  identical  Type  II 
error  probability  (denoted  by  (30).  We  apply  the  majority  dominance  rule  as  the  fusion 
rule  at  the  relay  nodes  and  at  the  fusion  center.  We  answer  the  following  questions 
about  the  Type  I  and  II  error  probabilities: 
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(«*>  A) 


Fusion 

Center 


M 


Figure  5.1 :  An  M -ary  relay  tree  with  height  k.  Circles  represent  sensors  making  initial 
measurements.  Diamonds  represent  relay  nodes  which  fuse  M  binary  messages.  The 
rectangle  at  the  root  represents  the  fusion  center  making  an  overall  decision. 


•  How  do  they  change  as  we  move  upward  in  the  tree? 

•  What  are  their  explicit  forms  as  explicit  functions  of  N7 

•  Do  they  converge  to  0  at  the  fusion  center? 

•  How  fast  will  they  converge  with  respect  to  N! 


5.2  Error  Probability  Bounds 


We  divide  our  analysis  into  two  cases:  M  is  an  odd  integer  (oddary  tree)  and  M  is  an 
even  integer  (evenary  tree).  In  each  case,  we  first  derive  the  recursion  of  the  Type  I 
and  II  error  probabilities  and  show  that  all  nodes  at  level  k  have  the  same  Type  I  and 
II  error  probabilities  (ak,0k)~  Then  we  study  the  step-wise  reduction  of  each  kind 
of  error  probability  after  fusion  with  majority  dominance  rule.  From  these  we  provide 
upper  and  lower  bounds  for  the  Type  I  and  II  error  probability  at  the  fusion  center.  We 
then  derive  upper  and  lower  bounds  for  the  total  error  probability  at  the  fusion  center. 


5.2.1  Oddary  tree 

Suppose  that  u0  is  the  output  binary  message  after  fusing  M  input  binary  messages 
U;  =  {«i,  v.2,  ■  ■  ■  ,Um },  where  ut  £  {0, 1}  for  all  t.  The  majority  dominance  rule, 
when  M  is  odd,  is  simply: 

_ (  1,  if  Ui  +  U2  +  •  •  •  +  Um  >  M/2', 

0,  if  U\  +  U2  +  •  • .  +  um  ^  M/2. 
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Assume  binary  messages  {ui}f£1  have  identical  Type  I  error  probability  a  and 
identical  Type  II  error  probability  /?.  Then,  the  Type  I  and  II  error  probability  pair 
(a/,  /?')  for  the  binary  message  u0  is  given  by: 


M 


V  =  Po(w0  =  1)  =  JJPo(Ui  =  1) 
'M 


M- 1 


M 

(M  -  l)/2 


o(ut  —  0)  Po(«i  —  !)  +  ••• 

i=l 

(M+l)/2  (M—l)/2 

P0(Uj  =  1)  PqK  =  0) 


=  f(a)  :=  aM  +  ^  jn  w  X(1  -  a)  +  . . . 

^a(M+1)/2(l  —  a)(M_1)/2 


M 

(M  —  l)/2 


and 


T' 


f{P)  =  pM  +  (^)pM-\  1-/3)  +  ... 


As  all  sensors  have  the  same  error  probability  pair  (cco,  To),  all  relay  nodes  at  level 
1  will  have  the  same  error  probability  pair  (ai,  Ti)  =  (/(cr o),  /(To))-  By  recursion, 
we  have 

(otk+i,  Tfe+i)  =  (/(«*;),  /(Tfc))j  A  =  0, 1, , 

where  (a^,  Tfe)  is  the  error  probability  pair  of  nodes  at  the  fc-th  level  of  the  tree.  Since 
the  recursions  for  at,  and  Tfc  are  the  same.it  suffices  to  consider  only  the  Type  I  error 
probability  ctk  in  studying  the  decay  speed.  Next  we  will  analyze  the  step-wise  shrink¬ 
age  of  the  Type  I  error  probability  after  each  fusion  step.  This  analysis  will  in  turn 
provide  upper  and  lower  bounds  for  the  Type  I  error  probability  at  the  fusion  center. 

Proposition  5.2.1.  Consider  an  M-ary  relay  tree,  where  M  is  odd.  Suppose  that  we 
apply  majority  dominance  rule  as  the  fusion  rule.  Then, 

1  ^  CTfc+l  ^  0M  — 1 
-  (M+l)/2  -  ^ 

afc 

Proof.  Consider  the  ratio  of  otk+i  and  q^.M+1^2: 
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First,  we  derive  the  lower  bound  of  the  ratio.  We  know  that 

1  =  {otk  +  1  -  afe)(M-1)/2 

(M— 1)/2  ({M  -  l)/2\  (M- 3)/2 


=  a 


fc  1  v  1 

((M-l)/ 2 


+  '-'+  V(M-l)/2 

Moreover,  it  is  easy  to  see  that 


«r  "  a 


M\  f(M  -  l)/2 
k  ~  \  k 


for  all  k  =  1,  2, . . . ,  (M  —  l)/2.  In  consequence,  we  have 

a/c+i 


(M+l)/2 


>  1. 


Next  we  derive  the  upper  bound  of  the  ratio.  Since  a*.  <  1,  we  have 

M 


otk+i  <  1  /M 


v(m+i)/2  -  *  V  1  y  1  •  •  •  +  ^(M  l)/2  1  2 


M— 1 


□ 


Using  the  above  proposition,  we  now  derive  upper  and  lower  bounds  for  log2  ak  . 

Theorem  5.2.1.  Consider  an  Ad -ary  relay  tree,  where  M  is  an  odd  integer.  Let  Am  = 
(M  +  l)/2.  We  have 

\kM(\og2  «o  1  -  (M  -  1))  <  log2  a"1  <  XkM  log2  ao 1. 


Proof.  From  the  inequalities  in  Proposition  5.2.1,  we  have 

^  _  r.  (Ai" H-l)/2  Am 

where  C&  G  [1,  2M_1].  From  these  we  obtain 


—  C/c_iC^_2  .  .  .  Cq 


where  Cj  €  [1,  2M  for  all  i ,  and 

log2  a,”1  =  -  log2  Cfc-1  -  Am  log2  Cfe_2  -  •  •  • 
-  A^1  log2  c0  +  Am  log2  ^ 
Since  log2  Cj  G  [0,  (M  —  1)],  we  have 

l°g2  1  <  am1o§2  “o1- 


Moreover,  we  obtain 

log2  a^1  >  -  (M  -  1)  -  A m(M  -  1)  -  . . . 

—  Ajj1^  —  1)  +  Am  l°g2  1 
>  Am (log2 cto1  -  (M-l)). 


□ 


59 


In  contrast  to  the  result  in  [36],  which  only  focuses  on  the  asymptotic  regime,  our 
result  holds  for  all  finite  k.  In  addition,  the  result  in  [36]  deals  with  the  total  error 
probability  at  the  fusion  center.  But  our  approach  provides  bounds  for  both  Type  I  and 
II  error  probabilities. 

Corollary  5.2.1.  Let  Pf,n  be  the  Type  I  error  probability  at  the  fusion  center  of  an 
M-ary  relay  tree,  where  M  is  odd.  We  have 

Ni°gM  A m  (log2  af1  -{M-  1))  < 

log2  Pflf  <  N1oSm  Xm  log2  a^1. 


5.2.2  Evenary  tree 

We  now  study  the  case  where  M  is  even  and  derive  upper  and  lower  bounds  for  Type  I 
error  probability.  We  still  use  the  majority  dominance  rule  (with  random  tie-breaking) 
as  the  fusion  rule  at  the  relay  nodes  and  at  the  fusion  center.  The  majority  dominance 
rule  in  this  case  is: 

!1,  if  Mi  +  U2  +  •  •  •  +  um  >  M/2, 

0  w.p.  Pb ,  if  rti  +  «2  +  ■  •  ■  +  um  =  M/2, 

1  w.p.  1  -  Pb,  if  ui  +  «2  +  •  •  •  +  um  =  M/2, 

0,  if  Ui  +  U2  +  •  •  •  +  um  <  M/2, 

where  Pb  denotes  the  Bernoulli  parameter  for  the  tie-breaking  case.  For  simplicity, 
we  assume  that  tie-breaking  is  fifty-fifty  in  this  report;  i.e.,  Pb  =  1/2.  In  this  case,  the 
recursions  for  the  Type  I  and  II  error  probabilities  are: 

M 

a'  =  PqK  =  1)  =  P0(ui  =  1) 

i=l 

/M\  M~  i 

+  (  i  )Mut  =  o)  Po(tti  =  1)  + . . . 

1  /  M  \  M/2  M/2 

+ 2  U/  fl  Po(“‘= 11  n  ■*»<“• =°> 

—  g{a)  :=  aM  +  P/)n'v,-1(l  -  a)  +  . . . 

+\(Mry^-p,r- 


p  =  m  =  pm+  -/?)  +  ... 

Next  we  study  the  step-wise  reduction  of  each  type  of  error  probability. 
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Proposition  5.2.2.  Consider  an  M-ary  relay  tree,  where  M  is  even.  Suppose  that  we 
apply  majority  dominance  as  the  fusion  rule.  Then, 

1  <"  ak+ 1  <"  oM  — 1 
1  -  M/2  —  Z 

ak 

The  proof  is  similar  to  that  of  Proposition  5.2.1  and  it  is  omitted.  Notice  that  the 
above  result  is  only  useful  when  M  >  4.  For  the  case  where  M  =  2  (balanced  binary 
relay  trees),  we  have 

tXf,  T  ctfc(l  ctfc)  ctfc 

and 

Pk+ i  =  Pi  +  @k{  1  —  Pk)  =  /5fc; 

that  is,  the  Type  I  and  II  error  probabilities  remain  the  same  after  fusion.  However, 
in  [33],  we  have  shown  that  with  the  unit-threshold  likelihood-ratio  test  as  the  fusion 
rule  at  the  relay  nodes  and  the  fusion  center,  the  total  error  probability  decays  to  0 
sub-exponentially  with  exponent  \/A\ 

From  the  above  proposition,  we  derive  upper  and  lower  bounds  for  the  Type  I  error 
probability  at  each  level  k. 

Theorem  5.2.2.  Consider  an  M-ary  relay  tree,  where  M  is  an  even  integer.  Let  A m  = 
M/2.  We  have 

\kM(\og2  a/" 1  -  (M  -  1))  <  log2  af 1  <  \kM  log2  af 1. 

The  proof  is  similar  to  that  of  Theorem  5.2.1  and  it  is  omitted.  Similar  with  the 
case  where  M  is  odd,  we  can  provide  upper  and  lower  bounds  for  the  Type  I  error 
probability  at  the  fusion  center. 

Corollary  5.2.2.  Let  Pf.n  he  the  Type  I  error  probability  at  the  fusion  center  of  an 
M-ary  relay  tree,  where  M  is  even.  We  have 

Ni°sm  Am  (log2  a"1  -  (M  -  1))  < 
log2  Pflf  <  N1oSm  Xm  log2  af 1. 

Notice  that  the  bounds  in  Corollaries  5.2.1  and  5.2.2  have  the  same  form  if  we 
simply  let  A m  =  \_{M  +  1)/2J .  In  the  next  section,  we  use  the  bounds  above  to  derive 
upper  and  lower  bounds  for  the  total  error  probability  at  the  fusion  center. 


5.2.3  Bounds  for  the  total  error  probability 

Let  7To  and  7Ti  be  the  prior  probabilities  for  the  underlying  hypotheses.  In  this  section, 
we  provide  upper  and  lower  bounds  for  the  total  error  probability  P^  at  the  fusion 
center.  It  is  easy  to  see  that 


Pn  =  ttoPf.n  +  TTi  Pm,n, 

where  Pf,n  and  Pm,n  correspond  to  the  Type  I  and  II  error  probabilities  at  the  fusion 
center.  With  the  bounds  for  each  type  of  error  probability,  we  provide  bounds  for  the 
total  error  probability  as  follows. 
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Theorem  5.2.3.  Consider  an  M-ary  relay  tree,  let  Xm  =  j.  We  have 
N1°Sm  -W  (log2  max{a0,  j0o}-i  -  (Af-  1))  < 

log2  -P^1  <  ZVlog“  Xm  (tto  log2  1  +  tti  log2  AT*)- 

Proof.  From  the  definition  of  Pjy,  that  is, 

Pn  =  k0Pf,n  +  TTl  Pm,n, 

we  have  the  following: 

Pn  <  max{PF’)jV,  Pm,n}- 

In  addition,  we  know  that  au  and  j3k  have  the  same  recursion.  Therefore,  the  maximum 
between  the  Type  I  and  II  error  probability  at  the  fusion  center  corresponds  to  the 
maximum  at  the  leaf  nodes.  Hence,  we  have 

ZVlos“  (log2  max{a0,  /30}_1  -  (M  -  1))  <  log2  P^1. 

By  the  fact  that  log2  a;-1  is  a  convex  function,  we  have 

l°g2  p^1  <  (7T0  log2  P^-jv  +  TTl  log2  Pm]n)- 

Therefore,  we  have 

log2  Pn1  <  ZVlo§M  Am  (tt0  log2  ap  1  +  TTi  log2  /S^1). 


□ 


5.2.4  Asymptotic  rates 

In  this  section,  we  study  the  decay  rate  of  the  error  probabilities  in  the  asymptotic 
regime.  We  show  that  in  the  case  where  M  is  even,  the  majority  dominance  rule  is 
sub-optimal.  We  also  compare  our  asymptotic  results  with  those  in  [36], 

First  from  Corollaries  5.2.1  and  5.2.2,  we  can  easily  derive  the  decay  rate  of  the 
Type  I  and  II  error  probabilities.  For  example,  for  the  Type  I  error  probability,  we  have 
the  following. 

Corollary  5.2.3.  Consider  an  M-ary  relay  tree,  let  Xm  =  j.  If  «o  is  fixed,  then 

log2  Pf,n  =  ®{N1oSm  Am). 

Proof.  To  analyze  the  asymptotic  rate,  we  may  assume  that  ag  is  sufficiently  small, 
that  is,  «o  <  2-(M-1Z.  In  this  case,  the  bounds  in  Corollaries  5.2.1  and  5.2.2  show 
that 

l°g2  Pf,n  =  Q(ZVIoSm  Am). 


□ 
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It  is  easy  to  show  that  logM  A m  is  monotone  with  respect  to  M.  Moreover,  as  M 
goes  to  infinity,  the  limit  of  logM  A m  is  1 .  That  is  to  say,  when  M  is  very  large,  the 
decay  is  getting  close  to  exponential.  In  terms  of  tree  structures,  when  M  is  very  large, 
the  tree  becomes  short,  and  therefore  achieves  similar  performance  to  that  of  bounded- 
height  trees.  From  the  fact  that  the  Type  I  and  II  error  probabilities  follow  the  same 
recursion,  it  is  easy  to  see  that  the  Type  II  error  probability  at  the  fusion  center  decays 
to  0  with  exponent  N1°Sm -  K  Moreover,  we  can  compute  the  decay  rate  of  the 
total  error  probability. 

Corollary  5.2.4.  Suppose  that  (ao,  fio)  is  fixed.  Given  any  prior  probabilities,  we  have 

log  2P^  =  ©(jVlog«A“). 

For  the  total  error  probability  at  the  fusion  center,  we  have  similar  arguments  with 
that  for  individual  error  probabilities.  For  large  M,  the  decay  of  the  total  error  proba¬ 
bility  is  close  to  exponential. 

Recall  the  results  from  [36],  in  which  it  is  shown  that,  with  any  combination  of 
fusion  rules, 

log  2P^  =0(N^m—).  (5.1) 

The  case  where  the  relay  nodes  and  the  fusion  center  use  the  majority  dominance  rule 
(with  random  tie-breaking)  to  combine  messages  was  considered  in  [36],  in  which  case 
the  decay  rate  of  the  total  error  probability  is  almost  optimal.  More  precisely, 

log.jP-1  =  f2(lVloSML  2  J). 

Our  results  for  the  odd  M  case  is  consistent  with  the  results  in  [36],  The  majority 
dominance  rule  in  this  case  is  essentially  optimal  in  the  sense  of  achieving  the  largest 
decay  exponent. 

log2  PN 1  =  ©  ( jVloSM  L  J )  =  ©  ( 7Vlo«M  f ) .  (5.2) 

However  in  the  case  where  M  is  even,  our  results  show  that 

log2  P-1  =  0(7Vlog“  ).  (5.3) 

Compared  with  (5.1),  which  is  the  upper  bound  for  log2  Pfi 1  in  [36],  our  upper 
bound  (5.3)  is  more  tight  in  the  case  of  even  M  and  it  has  the  exact  same  form  with 
that  of  the  lower  bound;  that  is,  we  find  the  explicit  decay  rate  (5.2)  of  the  total  error 
probability  in  this  case.  Second,  the  decay  exponent  shows  that  the  majority  dominance 
rule  in  this  case  is  essentially  sub-optimal  in  the  sense  of  achieving  the  best  decay  expo¬ 
nent.  For  example,  in  the  case  of  binary  relay  trees,  the  total  error  probability  remains 
after  fusion  with  the  majority  dominance  rule.  On  the  other  hand,  the  likelihood-rate 
test  with  unit  threshold  achieves  the  decay  exponent  V  N  [33], 

In  [36],  the  case  where  non-binary  message  alphabets  are  allowed  in  M-ary  relay 
trees  is  considered.  Suppose  that  all  nodes  in  the  tree  with  the  exception  of  the  fusion 
center  are  allowed  to  transmit  messages  from  message  alphabets  with  size  m.  Then 
with  the  scheme  in  [36], 

log^1  =Sl(N»), 
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where  p  =  1  +  ln(l  —  1/m)/  In  M. 

In  this  report,  we  introduce  another  message -passing  scheme  for  M-ary  relay  trees 
with  non-binary  message  alphabets.  We  show  how  the  decay  exponent  increases  with 
respect  to  the  size  of  the  non-binary  message  alphabet.  Compared  with  the  scheme  in 
[36],  we  show  that  in  order  to  achieve  the  same  decay  exponent,  our  scheme  involves 
much  lower  average  message  sizes. 


5.3  Non-binary  Message  Alphabets 


Consider  the  binary  hypothesis  testing  problem  in  M-ary  relay  trees.  We  characterize 
the  detection  performance  by  looking  at  the  total  error  probability  Pjv  at  the  fusion 
center.  We  have  derived  in  [37]  the  decay  rate  of  the  total  error  probability  at  the  fusion 
center  in  the  case  where  relay  messages  are  all  binary,  that  is, 

log^"1  =  ©(A^smL^^J). 

In  this  report,  we  allow  more  general  message  alphabet  (non-binary)  with  size  V,  and 
we  denote  this  tree  by  (M,T>)- tree.  We  have  studied  the  detection  performance  of 
(M .  2)-trees  in  [37]  by  investigating  how  fast  the  total  error  probability  decays  to  0. 
What  about  the  detection  performance  when  V  is  an  arbitrary  finite  integer? 

We  denote  by  uk  the  output  message  for  each  node  at  the  fc-th  level  after  fusing  M 
input  messages  u*-1  =  {it*-1,  it*-1, . . . ,  it^/1}  from  its  child  nodes  at  the  (k  —  1)- 
th  level,  where  it*-1  £  {0, 1, ... ,  V}  for  all  j  £  {1,2,...,  M}.  First,  we  consider 
an  (M,  2?)-tree  with  height  kg,  in  which  there  are  Mk°  sensors.  We  assume  that  the 
message  alphabet  size  is  sufficiently  large;  more  precisely, 

V  >  M*0_1  +  1.  (5.4) 

Suppose  that  each  sensor  compresses  its  measurement  into  a  binary  message  w/  £ 
{0, 1}  and  sends  it  upward  to  its  parent  node.  Moreover,  each  relay  node  simply  sums 
up  the  messages  it  receives  from  its  immediate  child  nodes  and  sends  the  summation 
to  its  parent  node;  that  is, 

M 

uk0  =  Y^ukt~\ 

t— 1 

Then  we  can  show  that  the  output  message  for  each  node  at  the  Ar-th  level  is  an  integer 
from  {0, 1, . . . ,  Mk}  for  all  k  £  {0,1,...,  Ato  —  1}.  Moreover,  this  message  essentially 
represents  the  number  of  its  own  child  sensors  that  send  ‘1’  upward.  (A  child  sensor  of 
a  node  in  the  tree  is  any  leaf  node  (sensor)  attached  to  the  subtree  rooted  at  that  node.) 

Because  of  inequality  (5.4),  at  each  level  k  in  the  tree,  the  message  alphabet  size  D 
is  large  enough  to  represent  all  possible  values  of  it*  (k  £  {0, . . . ,  ko}).  In  particular, 
the  fusion  center  (at  level  fc0)  knows  the  total  number  of  sensors  that  send  ‘1’  upward. 
In  this  case,  the  detection  performance  is  the  same  as  a  parallel  configuration,  where 
each  sensor  sends  a  binary  message  to  the  fusion  center  directly.  Recall  that  in  the 
parallel  configurations,  the  total  error  probability  decays  exponentially  fast  to  0. 
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M 


Figure  5.2:  A  message-passing  scheme  for  non-binary  message  alphabets  in  M- ary 
relay  tree. 


Next  we  consider  the  case  where  tree  height  is  very  large.  As  shown  in  Fig.  5.2, 
we  apply  the  scheme  described  above,  that  is,  the  sensors  send  binary  compressions  of 
their  measurements  upward  to  their  parent  nodes.  Moreover,  each  relay  node  simply 
sends  the  sum  of  the  messages  received  to  the  parent  node;  i.e., 

M 

uk0=Y.u1~l‘  (5.5) 

t= i 

From  the  assumption  of  large  tree  height,  it  is  easy  to  see  that  the  message  alphabet 
size  is  not  large  enough  for  all  the  relay  nodes  to  use  the  fusion  rule  described  in  (5.5). 
With  some  abuse  of  notation,  we  let  ko  to  be  the  integer  ko  =  |~logM(2?  —  1)].  Note 
that 

Mk°~l  +  1  <  V  <  Mk°  +  1.  (5.6) 

From  the  previous  analysis,  we  can  show  that  with  this  scheme  the  nodes  at  the 
/c0-th  level  knows  the  number  of  ‘l’s  from  its  child  sensors.  Therefore,  it  is  equivalent 
to  consider  the  case  where  the  nodes  at  level  the  ko  connect  to  Mk°  sensors  directly  (all 
the  intermediate  relay  nodes  can  be  ignored).  However,  we  cannot  use  the  fusion  rule 
described  in  (5.5)  for  the  nodes  at  fco-th  level  to  generate  the  output  messages  because 
the  message  alphabet  size  is  not  large  enough.  Hence,  we  let  each  node  at  level  ko  to 
aggregate  the  M  messages  from  its  immediate  child  nodes  into  a  new  binary  message 
using  the  majority  dominance  rule  (with  random  tie-breaking;  same  fusion  rule  as  in 
[37]).  Therefore,  the  output  message  from  each  node  at  the  fco-th  level  is  binary  again. 
We  can  simply  apply  the  fusion  rule  (5.5)  and  repeat  this  process  throughout  the  tree, 
culminating  at  the  fusion  center. 

Theorem  5.3.1.  The  detection  performance  of{M ,  T>)-trees  is  equal  to  that  of(Mk° ,  2)- 
trees,  where  ko  =  |~logM(I?  —  1)] .  In  particular,  if  Pn  be  the  total  error  probability  at 
the  fusion  center  for  ( M1'D)-tree ,  then 

log  2p^  =  e(N°), 
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where 


Q  ■= 


ln(Mfc°  +1)  _  log m_ 
\nMko  ko 

1  _  logj/  2 
k0  ’ 


if  Mis  odd, 
if  M  is  even. 


Proof  Consider  an  (M,  2?) -tree  with  the  scheme  described  above.  It  is  easy  to  see  that 
equivalently  we  can  consider  a  tree  where  the  sensors  connect  to  the  nodes  at  the  /.:(,-th 
level  directly.  In  addition,  because  of  the  recursive  strategy  applied  throughout  the  tree, 
it  suffices  to  consider  the  tree  where  the  nodes  at  the  £fco-th  level  connect  to  the  nodes 
at  the  (1+  l)/c0-th  level  directly  for  all  non-negative  integers  L  Therefore,  the  detection 
performance  of  (M,  2?)-tree  is  equal  to  that  of  the  corresponding  (Mk°,  2)-tree. 

We  have  shown  in  [37]  that  the  total  error  probability  in  (M,  2)-trees  decays  to  0 
with  the  following  rate: 


log^1  =  0(7Vlog“L^^J). 

Therefore,  the  decay  rate  for  (Mk° ,  2)-trees  is  simply: 

i  i  I  (Mfc 0+1)  | 

log2  Pn  =  0(iVlogMfco  L  2  J ), 

which  can  be  simplified  easily  as  follows: 


where 


log2  Pn  =  ©  (Ne) . 

,  if  Mis  odd. 


n  —  /  In  Mk o  ko 

"  '  '  -  logM  2 


1 - tt1-!  if  Mis  even. 

k0  > 


□ 


Notice  that  lirriM^oc  ln(Mfc°  +  1 ) /  hi  Mk°  =  1,  which  means  that  the  even  and 
odd  cases  in  the  expression  for  g  are  similar.  Hence  in  the  following  context,  we  will 
simply  analyze  the  case  where  M  is  even.  From  Theorem  5.3.1,  we  can  see  that,  with 
larger  message  alphabet  size,  the  total  error  probability  decays  more  quickly.  However, 
the  change  of  the  decay  exponent  is  not  significant  because  A:(l  depend  on  V  logarith¬ 
mically.  Furthermore,  if  M  is  large,  then  the  change  of  the  performance  becomes  less 
sensitive  to  the  increase  in  V. 


5.4  Scheme  Comparison 


In  this  section,  we  compare  our  scheme  to  that  of  [36].  We  show  that  in  order  to  achieve 
the  same  decay  exponent,  the  average  message  size  used  in  our  scheme  is  much  smaller 
than  that  used  in  [36], 

First,  notice  that  the  result  in  [36]  is  a  lower  bound  for  the  decay  rate.  On  the  other 
hand,  our  result  contains  the  explicit  decay  rate  of  the  total  error  probability  using  our 
scheme.  The  decay  exponent  in  [36]  is  Np,  where  p  =  1  —  log M{m/(m  —  1)).  The 
Taylor  expansion  for  p  as  m  — ►  oo  is 
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(— 

In M  \m 


1 

2  to2 


Therefore,  for  large  m  we  have 


P<  1 


1 

In  M 


On  the  other  hand,  the  decay  exponent  associated  with  our  scheme  is  Ne,  where 

logM  2  =  _  1  /  J_\ 

k0  log  2M\k0)' 

Therefore,  in  order  to  achieve  the  same  decay  exponent  in  the  asymptotic  sense,  we 
should  have 

to  =  Q(ko). 

Notice  that  in  our  scheme  the  maximum  alphabet  size  required  is  Mfe°_1  +  1.  There¬ 
fore,  the  maximum  alphabet  size  in  our  scheme  is  much  larger  than  that  of  the  scheme 
in  [36],  However,  all  the  nodes  in  [36]  use  the  same  message  alphabet  with  size  to.  In 
our  scheme  only  very  few  nodes  in  the  tree  are  essentially  using  the  maximum  message 
alphabet.  For  example,  the  sensors  only  send  binary  messages  upward  to  their  parent 
nodes. 


It  is  interesting  to  compare  the  average  message  size  used  in  order  to  achieve  such 
detection  performance.  For  the  scheme  in  [36],  the  average  message  size  used  (in  bits) 
is  simply  b  =  log2  to  =  0(log2  ko).  On  the  other  hand,  the  average  size  in  bits  used  in 
our  scheme  can  be  calculated  as  follows: 

y  Mk°  +  . . .  +  Mlog2(Mfe°-1  +  1) 

b{ko)  “  Mk°  +  Mk o-1  +  ...+M 

We  have 

log2  (Mk  +  1)  >  log2  Mk  =  k  log2  M 

and 

log2(Mfc  +  1)  <  log2(2Mfei)  =  1  +  fclog2  M 

for  all  k.  Therefore,  the  average  size  in  bits  is  lower  bounded  by  the  following  inequal¬ 
ity: 

-  ^  Mk°  +  Mfco_1  log2  M+...  +  M(k0  -  1)  log2  M 

^  >  Mk o  +  Mk»~ 1  +  ...  +  M 

Mk° 

_  Mk  o  +  +  ...  +  M 

log 2M(M2(Mk°~1  -  1)-M(k0  -  1))/(M  —  l)2 
+  Mk°  +  Mfeo_1  +  ...  +  M 

_Mk°  —  Mk°~x  |  M log2  M  Mfc°-1  —  1  —  M(ko  —  1) 

Mk°  -  1  +  M  —  1  Mk°  -  1  ‘ 


In  addition,  it  is  upper  bounded  by: 


b(ko)  <  1  + 


M  log2  M  Mk°~l  -  1  -  M(k0  -  1) 
M  —1  Mk o  - 1 
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From  these  inequalities,  it  is  easy  to  show  that  as  /cq  — >  oo. 


b(ko)  S  1 


J_  log2  M 

M  +  M 


Therefore,  with  large  ko,  the  average  message  size  in  terms  of  bits  in  our  scheme  is 
much  smaller  than  that  in  [36]. 


b  =  log2  m  =  0(fco)  >  1 


1 

M 


log2  M 
M 


>  b(k0). 
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Chapter  6 


Information  Geometry  of  Target 
Tracking  Sensor  Networks 


The  work  in  this  chapter  was  done  by  Yongqiang  Cheng,  Xuezhi  Wang,  Mark  More- 
lande,  and  William  Moran. 


6.1  Introduction 


Advanced  technologies  for  sensing,  computing  and  networking  create  enormous  oppor¬ 
tunities  for  handling,  gathering  and  processing  measurement  information  via  various 
sensor  networks.  It  is  desirable  to  assess  the  performance  of  a  sensor  network  effec¬ 
tively  in  many  application  fields,  where  the  statistical  properties  of  sensor  networks 
are  crucial.  Information  geometry,  which  is  gradually  gaining  significance  as  it  allows 
the  analysis  of  statistical  properties  of  sensor  networks  from  a  unified  perspective,  has 
been  identified  as  a  sophisticated  and  powerful  tool  for  this  purpose  [43,  44]. 

Information  geometry  is  the  study  of  intrinsic  properties  of  manifolds  of  probabil¬ 
ity  distributions  [44],  where  the  ability  of  data  to  discriminate  those  distributions  is 
translated  into  a  Riemannian  metric 1 .  Specifically,  the  Fisher  information  provides  a 
local  measure  of  discrimination  of  the  distributions  that  translates  immediately  into  a 
Riemannian  metric  on  the  parameter  manifold  of  the  distributions.  The  main  tenet  of 
information  geometry  is  that  many  important  notions  (e.g.  Fisher  information,  testing, 
estimation,  estimation  accuracy)  in  probability  theory,  information  theory  and  statistics 
can  be  treated  as  stmctures  (e.g.  metric,  divergence,  projection,  embedded  curvatures) 
in  differential  geometry  by  regarding  the  space  of  probabilities  as  a  differentiable  man¬ 
ifold  endowed  with  a  Riemannian  metric  and  a  family  of  affine  connections,  including, 
but  not  exclusively,  the  canonical  Levi-Civita  affine  connection  [45],  By  providing  the 
means  to  analyse  the  Riemannian  geometric  properties  of  various  families  of  probabil¬ 
ity  density  functions,  information  geometry  offers  comprehensive  results  about  statis- 

1 A  Riemannian  metric  is  an  inner  product  defined  on  the  tangent  space  of  a  manifold.  It  encodes  how  to 
measure  distances,  angles  and  area  at  a  particular  point  on  a  manifold  by  specifying  a  scalar  product  between 
tangent  vectors  at  that  point. 
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tical  models  simply  by  considering  them  as  geometrical  objects. 

This  geometric  theory  of  statistics  was  pioneered  in  the  1940s  by  Rao  [46],  who  first 
interpreted  the  Fisher  information  matrix  as  a  Riemannian  metric  on  the  space  of  prob¬ 
ability  distributions.  Since  then  many  scholars  have  contributed  to  the  development 
of  this  theory  for  statistical  models.  In  1972,  Chentsov  in  [47]  introduced  a  family 
of  affine  connections  and  proved  the  uniqueness  of  the  intrinsic  metric  and  the  one- 
parameter  family  of  affine  connections.  Meanwhile,  Efron  [48]  undertook  pioneering 
work  in  a  slightly  different  direction.  He  defined  a  concept  of  curvature  called  statis¬ 
tical  curvature  and  described  the  basic  role  of  curvature  in  the  high-order  asymptotic 
theory  of  statistical  inference.  Since  then,  several  different  groups  have  brought  to  ma¬ 
turity  the  theoretical  framework  of  statistical  geometry.  Of  particular  note  is  the  work 
of  Amari  and  his  collaborators  [45,  49,  50]  who  have  developed  a  duality  structure  the¬ 
ory  and  have  unified  all  of  these  theories  in  a  differential-geometrical  framework  which 
not  only  enriches  the  theory  of  information  geometry  but  also  provides  opportunities 
for  a  wide  range  of  applications.  Amari’s  major  motivation  is  in  machine  learning. 
Here,  we  study  the  theory  from  a  statistical  signal  processing  perspective. 

Information  geometry  has  found  many  applications  in  the  asymptotic  theory  of  sta¬ 
tistical  inference  [51],  semiparametric  statistical  inference  [52],  the  study  of  Boltzmann 
machine  [53],  the  Expectation-Maximization  (EM)  algorithm  [54],  and  learning  of  neu¬ 
ral  networks  [55],  all  with  certain  degree  of  success.  In  the  last  two  decades,  its  applica¬ 
tion  has  spanned  several  discipline  areas  such  as  information  theory  [56,  57],  systems 
theory  [58,  59],  mathematical  programming  [60],  and  statistical  physics  [61,  62],  It 
also  played  a  central  role  in  the  multi-terminal  estimation  theory  [63].  In  neuroscience 
it  has  been  used  to  extract  higher-order  interactions  among  neurons  [64].  Many  re¬ 
searchers  around  the  world  are  applying  information  geometry  to  new  applications  and 
formulating  new  interpretations.  An  example  of  the  former  is  the  derivation  of  the 
intrinsic  Cramer  Rao  bound  for  the  subspace  tracking  problem  on  manifolds  given  in 
[65], 

Information  geometry  can  also  provide  new  viewpoints  in  the  analysis  of  sensing 
systems.  While  important,  understanding  information  geometry  theory  is  nontrivial. 
Sensor  networks  for  target  tracking  form  an  important  class  of  information  networks. 
It  is  well  understood  that  the  performance  of  target  detection  and  tracking  depends 
heavily  on  the  sensing  ability  of  the  underlying  sensor  network,  which  may  consist 
of  sensors  ranging  from  large  like  radars  to  small  like  motes.  The  advances  in  engi¬ 
neering  and  sensing  technologies  enable  more  complex  sensor  networks  to  be  built  for 
target  detection  and  tracking.  The  evaluation  of  sensor  network  performance  becomes 
increasingly  important,  in  particular,  for  sensor  network  design,  configuration  and  opti¬ 
mization.  We  believe  that  information  geometry  is  able  to  offer  advanced  tools  to  allow 
us  to  explore  and  therefore  understand  the  structures  of  sensor  systems.  This  work  is 
motivated  to  explore  such  potential  in  a  simple  and  sensible  way,  using  basic  sensor 
problems  as  exemplars. 

In  our  recent  work  in  [66]  and  [67],  the  Integrated  Fisher  information  distance 
(IFID)  between  two  targets  was  approximately  calculated  and  used  to  measure  target 
resolvability  in  the  region  of  interest  covered  by  a  sensor  network.  Nevertheless,  the 
proposed  approximation  for  calculating  IFID  is  only  valid  for  closely  spaced  targets 
and  the  exact  IFID  must  be  evaluated  by  computing  the  integral  along  the  geodesic 
connecting  the  two  target  states,  which  is  generally  nontrivial. 
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In  this  report,  the  connections  between  information  geometry  and  the  performance 
of  sensor  networks  for  target  tracking  are  explored  in  an  attempt  to  gain  a  better  under¬ 
standing  of  sensor  network  measurement  issues.  The  exact  calculation  of  IFID  and 
Ricci  curvatures  for  the  sensor  networks  with  a  joint  likelihood  are  presented  and 
analyzed.  The  interpretation  of  the  geometry  of  statistical  manifolds  for  sensor  net¬ 
works  is  illustrated  via  the  affine  immersion.  The  analysis  is  presented  via  three  typical 
sensor  network  scenarios:  1)  a  simple  range-bearing  radar,  2)  two  bearings-only  pas¬ 
sive  sonars,  and  3)  three  ranges-only  detectors,  respectively.  In  these  scenarios  it  is 
shown  how  information  geometry  can  be  used  to  address  system  measurement  issues 
such  as  evaluating  sensor  capability  to  distinguish  closely  spaced  targets,  measuring 
the  amount  of  information  collected  by  sensors  and  solving  the  problem  of  optimal 
scheduling  of  network  sensor  and  resources.  Although  simple  synchronized  sensor 
networks  with  sensors  of  the  same  type  are  considered  in  the  demonstrative  examples, 
the  analysis  method  can  be  applied  to  a  more  general  case  where  dissimilar  sensors  are 
involved  as  long  as  the  likelihood  and  Fisher  information  matrix  of  the  measurement 
system  are  available. 

The  major  contributions  of  this  chapter  are  summarised  as  below. 

1 .  The  IFID  between  the  states  of  two  targets  is  computed  by  solving  the  geodesic 
equations  and  is  used  to  measure  the  ability  of  a  sensor  network  to  resolve  targets. 
The  differences  between  IFID  and  the  well  known  Kullback-Leibler  divergence 
are  described. The  relationship  with  the  energy  functional,  which  is  the  integrated 
differential  Kullback-Leibler  divergence,  and  the  differences  between  it  and  the 
other  two  measures  of  divergence  are  described. 

2.  The  structures  of  statistical  manifolds  are  elucidated  by  computing  the  canonical 
Levi-Civita  affine  connection  as  well  as  Riemannian  and  scalar  curvatures.  The 
relationship  between  the  Ricci  curvature  tensor  field  and  the  amount  of  informa¬ 
tion  achievable  by  the  network  sensors  is  highlighted. 

3.  An  analytical  presentation  of  statistical  manifolds  as  immersions  in  Euclidean 
space  for  the  distributions  of  the  exponential  family  is  given. 

The  rest  of  the  chapter  is  organized  as  follows.  In  the  next  section,  the  problem 
of  interest  and  the  motivations  of  this  work  are  described.  The  principles  of  infor¬ 
mation  geometry  are  then  introduced  in  Section  6.3.  In  Section  6.4,  sensor  network 
information,  as  measured  by  the  IFID,  is  analyzed  for  three  basic  types  of  sensor  net¬ 
work  problems;  the  canonical  Levi-Civita  affine  connection  as  well  as  Riemannian  and 
scalar  curvatures  are  calculated  to  elucidate  the  structure  of  the  statistical  manifold;  an 
interpretation  of  Ricci  curvature  tensor  field  related  to  information  issues  is  discussed 
at  the  end  of  this  section.  The  affine  immersions  of  manifolds  corresponding  to  sensor 
networks  are  presented  in  Section  6.5,  which  is  followed  by  the  conclusions  in  Section 
??. 
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6.2  Target  tracking  in  sensor  networks 


Let  target  state  at  time  k  be  denoted  as  an  n  dimensional  vector2,  i.e.,  9k  =  \9\,k,  •  •  •  ,  0n^]T  € 
M",  where  the  superscript  T  is  the  matrix  transpose.  Target  dynamics  are  assumed  to 
follow  a  Markov  process  with  additive  Gaussian  noise. 

Gk+1  =  f(Gk)  +  vk ,  vk  ~  Af( 0,  Qk)  (6.1) 

where  /  is  the  system  transition  (dynamical)  model  and  vk  represents  process  noise, 
which  is  assumed  to  be  a  zero-mean  Gaussian  distribution  with  covariance  matrix  Qk. 

The  measurement  of  the  system  at  time  k  is  modeled  as 

xk  =  n(6k)  +  wk,  wk  ~  Af(0,  Ck)  (6.2) 

where  //  is  the  measurement-to-target  state  space  transition  function  and  wk  is  the 
measurement  noise  approximated  by  a  zero  mean  Gaussian  distribution  with  covariance 
matrix  Ck.  The  problem  of  target  tracking  is  to  find  the  posterior  probability  density  of 
target  state  based  on  a  sequence  of  measurements,  i.e.,  p(9k\xi:k),  where  Xi:k  stands 
for  a  sequence  of  measurements  up  to  time  k. 

Standard  techniques  of  Bayesian  estimation  yield  a  recursive  solution,  given  by 

p(xk\9k)p(0k\xi:k-i) 

p(Pk\xi-.k)  —  p  (  \ a  \  iq  \  j  (6.3) 

J  p{Xk\0k,Xuk-l)p{6k\Xl:k-l)dxk 

where  p(xk\9k)  is  the  measurement  likelihood  and  the  predicted  density  p(9k\xi:k-i) 
is  determined  by  the  posterior  density  p(9k-i  |*i:fc-i)  at  time  k  —  1  and  the  transition 
density  p{9k\9k-i)'. 

=  fm  <«) 

The  measurement  likelihood  plays  a  central  role  in  the  “Bayesian  update”  algorithm 
of  (6.3).  In  fact,  this  measurement  density  function  is  fully  determined  by  the  in¬ 
trinsic  properties  of  the  underlying  sensor  network  and  its  form  has  a  great  influence 
on  the  computational  solution  of  the  tracking  problem.  In  practice,  the  likelihoods 
of  most  implementable  sensor  networks  such  as  the  binary  sensor  networks  in  [68] 
and  those  presented  in  this  report  belong  to  the  exponential  class  of  density  functions, 
called  exponential  families  [69].  Many  popular  distributions,  such  as  Gaussian,  Pois¬ 
son,  Gamma  and  Dirichet  etc.,  are  exponential  families. 

One  of  the  most  challenging  issues  in  target  tracking  is  the  differentiation  of  a  target 
measurement  from  those  due  to  other  targets  and  clutter,  which  is  also  known  as  the 
data  association  [70].  In  the  presence  of  measurement  uncertainties,  it  is  important  to 
know  how  well  two  closely  spaced  targets  can  be  differentiated  using  the  measurements 
taken  by  a  sensor  network.  The  ability  to  separate  two  closely  spaced  targets  using 
their  measurements  is  called  target  resolvability,  which  is  essentially  a  property  of  the 
likelihood  and  can  be  intuitively  described  via  the  platform  of  information  geometry  in 
terms  of  the  IFID  between  two  points  on  a  statistical  manifold. 

2  In  this  report,  a  symbol  in  bold  face  is  used  to  denote  a  vector  and  the  subscript  k  refers  to  time  index. 
Sometimes,  the  time  index  is  dropped  and  the  subscript  is  subsequently  used  to  index  the  location  of  an 
vector  without  causing  confusion. 
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In  this  report,  the  major  concern  is  with  how  much  and  what  kind  of  target  in¬ 
formation  can  be  obtained  from  the  measurements  of  a  sensor  network  at  a  particular 
time.  Issues  of  interest  include  identifiability  of  the  underlying  state  with  respect  to 
the  sensor  measurements  and  analysis  and  optimization  of  the  information  gathering 
capacity  of  the  sensor  network.  Information  geometry  provides  intuitive,  geometrical 
interpretations  of  these  problems.  In  the  exploration  of  sensor  network  measurements 
in  the  context  of  statistical  manifolds,  the  following  issues  are  of  interest: 


1.  Calculation  of  the  distance  i.e.,  the  integrated  Fisher  information  distance  (IFID) 
between  two  targets  on  a  statistical  manifold  and  observing  how  the  Euclidean 
distance  differs  from  the  information  distance  between  two  targets.  The  IFID  is 
a  candidate  criterion  to  measure  the  resolvability  of  closely  spaced  targets.  Also 
of  interest  is  how  the  Kullback-Leibler  divergence  is  related  to  the  IFID.  An¬ 
other  connected  distance  measure  is  the  integrated  differential  Kullback-Leibler 
divergence,  which  is  exactly  the  energy  functional. 

2.  The  amount  of  target  information  which  can  be  acquired  from  the  measurement 
of  a  sensor  network  depends  on  the  structure  of  the  corresponding  statistical  man¬ 
ifold,  described  by  the  curvatures  of  Riemannian  geometry.  We  wish  to  calculate 
the  canonical  connection  as  well  as  curvatures  of  the  statistical  manifold  for  a 
given  sensor  network  and  show  how  the  Ricci  curvature  tensor  field  is  related  to 
the  performance  of  sensor  networks  in  the  information  perspective. 

3.  The  underlying  shape  of  a  given  statistical  manifold  is  of  great  interest.  We 
explore  the  embedding  of  a  parameter  manifold  corresponding  to  a  given  sensor 
network  in  a  flat  Euclidean  space.  Knowledge  of  the  manifold  shape  is  useful  in 
the  development  of  computational  tools  for  measuring  and  controlling  systems 
and  in  the  optimisation  of  the  relative  error  performance  of  using  sensor  network 
measurement. 


6.3  Principles  of  Information  Geometry 

6.3.1  Definition  of  statistical  manifold 

Information  geometry  originates  with  the  study  of  manifolds  of  parameters  arising  from 
parameterized  families  of  probability  distributions  that  are  the  standard  basic  constructs 
of  estimation  theory.  Consider  the  parameterized  family  of  probability  distributions 
S  =  {  p(x\9 )},  where  a;  is  a  random  variable  and  6  =  [9\,  ■  ■  ■  ,  9n]T  is  an  n  dimen¬ 
sional  parameter  vector  specifying  the  distribution.  In  a  general  context,  the  parameter 
vector  resides  on  an  abstract  manifold.  It  is  possible  to  think  of  sensing  situations  where 
the  underlying  manifold  will  have  the  topological  structure  of  a  more  complex  geomet¬ 
rical  object  such  as  a  sphere,  a  torus,  or  the  space  of  orthogonal  matrices  [65].  The 
family  S  is  regarded  as  a  statistical  manifold  with  Q  as  its  (possibly  local)  coordinate 
system  [64], 

Figure  6.1  illustrates  the  definition  of  a  statistical  manifold.  For  a  given  state  of 
interest  9  in  the  parameter  space  0  £  R",  the  measurement  x  in  the  sample  space 
X  £  Rm  is  an  instantiation  of  a  probability  distribution  p(x\G  ).  Each  probability 
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Figure  6. 1 :  Illustration  of  the  relation  between  parameter,  measurement  and  the  corre¬ 
sponding  statistical  manifold. 


distribution  p(x\Q  )  is  labelled  by  a  point  s(0)  in  the  manifold  S.  The  parameterized 
family  of  probability  distributions  S  =  {p(x\9)}  forms  an  ?r-dimensional  statistical 
manifold  where  9  plays  the  role  of  a  coordinate  system  of  S.  In  more  general  situa¬ 
tions,  such  as  a  sphere,  a  torus  or  the  orthogonal  group,  this  coordinate  system  will  be 
local,  and  will  change  depending  on  the  part  of  the  manifold  under  consideration.  In 
our  examples,  the  coordinate  system  which  denotes  the  target  state  of  interest  is  global. 


6.3.2  The  metric  and  integrated  fisher  information  distance 


For  a  parameterized  family  of  probability  distributions  on  a  statistical  manifold,  the 
Fisher  information  matrix  (FIM)  plays  the  role  of  a  Riemannian  metric  tensor  [46]. 
Denoted  by  G{9)  =  \gij{9)\,  the  FIM  is  defined  as 


9ij(0)=E 


d\ogp(x\9)  <91ogp(a:|0)  ^ 
ddi  Wj  J  ’ 


(6.5) 


where  E  signifies  expectation.  The  FIM  measures  the  ability  of  the  random  variable  x 
to  discriminate  the  values  of  the  parameter  9'  from  9  for  9'  close  to  9. 

The  statistical  manifold3  S  carries  the  structure  of  a  smooth  Riemannian  manifold 
whose  metric  is  defined  by  the  FIM  G(9)  [46].  Here  {<9 log {-)/d9i}  is  a  basis  for  a 
vector  space  of  random  variables.  The  vector  space  is  identified  as  the  tangent  space 
of  S  at  9 ,  denoted  as  7 ~e  S.  With  this  structure  in  place,  we  can  bring  the  machinery 
of  Riemannian  geometry  to  bear  on  statistical  problems.  In  particular,  the  operations 
of  covariant  differentiation  can  be  defined  to  describe  the  various  connections  of  in¬ 
terest  using  the  one-one  correspondence  between  the  statistical  parameter  model  and 
Riemannian  manifold  [71], 


In  a  Riemannian  manifold,  the  important  concepts  such  as  distance,  angle  and  tan¬ 
gent  are  defined  analogously  to  the  case  of  the  Euclidean  space,  but  they  only  make 
sense  locally.  It  is  possible  to  integrate  “distance”  along  curves  between  two  points  and 

3In  this  report,  the  notation  (S',  g )  is  sometimes  used  to  signify  a  Riemannian  manifold  equipped  with  a 
Fisher  information  metric  of  element  g. 
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then  select  a  curve  of  the  shortest  length  as  a  distance  measure  between  the  two  points 
on  the  manifold.  Information  geometry  allows  us  then  to  establish  a  genuine  metric 
between  statistical  distributions  that  is  invariant  to  transformations  by  non-singular  pa¬ 
rameter  transformations  in  [72].  To  describe  these  ideas,  we  note  that  the  infinitesimal 
squared  distance  between  two  closely  spaced  distributions  p(x\0)  and  p{x\G  +  dO)  is 
given  by  the  quadratic  form  of  dG  as  in  [64], 

ds 2  =  ^^dijdOiddj  =  dGJ  G{6)dd  (6.6) 

ij 


Consider  a  curve  0(f)  G  ©joining  Qi  =  9(ti)  and  G2  =  #(£2),  ft  <  t  <  f2,  which 
can  be  represented  by  a  parametric  equation  in  the  parameter  space  with  a  single  free 
parameter  f ,  and  let  the  distance  along  the  curve  between  its  endpoints,  namely  the  two 
distributions  p{x\6{)  and  p{x |02)  along  0(f)  [73],  be 


£>(0i,  02) 


dt 


(6.7) 


where  =  stands  for  “defined  as”.  This  distance  is  dependent  on  the  choice  of  the  curve. 
The  distance  between  p[x\0{)  and  p(x  |  02 )  is  defined  as  the  minimum  of  such  distances 
over  all  possible  curves.  The  integrated  Fisher  information  distance  (IFID)  between 
two  distributions  p(x\G\)  and  p(tc|02)  is  defined  as  the  integral  along  the  curve  0(f) 
that  minimises  (6.7)[74],  i.e., 


VF{0 1,  02 ) 


dt 


(6.8) 


A  curve  on  the  statistical  manifold  which  is  a  stationary  point  for  VF(G 1,  62)  is  de¬ 
noted  by  7(f)  G  S.  It  is  locally  the  shortest  path  joining  the  two  points  p(x\9i)  and 
p(tc|02)  on  the  statistical  manifold  and  is  called  a  geodesic.  The  existence  of  curves 
that  are  minimal  in  this  sense  is  well  documented  in  [44].  The  IFID  is  a  genuine  metric, 
in  the  sense  that  it  satisfies  the  symmetry  property 


£>f(0i,  02)  =£>f(02,  0t) 

for  all  0i,  02  G  ©,  and  the  triangle  inequality 

VF(0U  G2)  +  VF{G2,  03)  >2>f(0i,  03), 


(6.9) 


(6.10) 


for  all  0i,  02,  63  G  0. 

While  the  evaluation  of  the  IFID  is  generally  difficult,  the  distance  between  two 
distributions  p(a;|0i )  and  p{x |02)  may  be  approximated  with  a  variety  of  alternatives. 
A  popular  alternative  to  the  IFID  is  the  Kullback-Leibler  divergence  (KLD)  [74], 

KLT>[p{x\01)\\p(x\G2)\  =  j  p{x\G1)\ogPjC^-dx 

J  P(x\G2) 

=  E^logp(x\G1)  —  logp(a;|02)|.  (6.11) 
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It  is  also  well  known  that  the  following  relationship  between  the  KLD  and  differ¬ 
ential  Fisher  information  distance  holds  [64,  66], 

ds 2  =  2 KLD  \p(x\6)\\p(x\6  4-  d.O)]  (6.12) 


The  KLD  allows  for  the  approximation  of  information  distance  in  the  absence  of 
statistical  manifold  geometry,  though  it  is  not  a  genuine  metric  [74];  it  fails  to  be  sym¬ 
metric  or  satisfy  the  triangle  inequality.  The  failure  of  KLD  to  be  a  metric  is  a  serious 
issue  in  terms  of  its  use  to  measure  difference  between  distributions.  The  axioms  of 
symmetry  and  the  triangle  inequality  are  natural  and  accord  with  our  intuitive  notion 
of  distance.  A  phrase  like  “the  KLD  between  pi  and  pf’  is  meaningless,  for  example, 
without  explaining  that  it  is  the  KLD  from  p-\  to  p-i-  While  the  IFID  is  more  difficult 
to  calculate,  it  carries  the  same  kind  of  information  theoretic  significance  as  the  KLD 
while  being  a  genuine  metric. 

An  intermediate  between  the  KLD  and  the  IFID  is  the  energy  functional.  It  is  the 
integral  of  the  differential  KLD.  As  aforementioned  the  KLD  satisfies  (6.12)  for  small 
perturbations  of  9.  However,  the  failure  of  KLD  to  be  a  metric  means  that  integration 
of  this  infinitesimal  quantity  along  a  geodesic  does  not  return  the  KLD  between  the 
endpoints.  Thus,  a  different  quantity,  which  is  the  integral  of  this  differential,  or  up 
to  a  factor  of  two,  can  be  calculated.  This  object,  well  known  in  differential  geometry 
as  the  energy  functional  £(9),  describes  the  total  kinematic  energy  increment  of  a  free 
particle  (of  unit  mass)  moving  along  a  curve  0  from  9(t\)  to  9{tf)  in  the  manifold 
equipped  with  metric  G(9),  i.e.. 


(6.13) 


Surprisingly,  minimizing  this  energy  £(9)  with  respect  to  the  curve  y(t)  leads  to 
the  same  equations,  i.e.,  the  Euler-Lagrange  equations  in  the  local  parameter  9 ,  as  the 
solution  of  a  geodesic  parameterized  by  arc  length  t.  [75],  In  this  report,  we  refer  to  the 
integral  (6.13)  along  a  geodesic  path  7 (t)  between  the  two  distributions  as  the  Energy 
Difference  (ED)  £g(p). 


6.3.3  Geodesics  and  exponential  map 


The  IFID  and  the  ED  between  two  distributions  are  defined  in  terms  of  the  short¬ 
est  geodesic  in  the  Riemannian  (statistical)  manifold.  There  may  be  several  different 
geodesics  connecting  two  points  such  as  two  points  on  a  torus.  In  the  study  of  sensor 
networks,  the  trajectories  of  geodesics  in  the  Euclidean  space  are  of  interest.  Rigor¬ 
ously  speaking,  the  definition  of  a  geodesic  as  a  stationary  point  of  the  distance  integral 
on  a  smooth  manifold  S  with  affine  connection  V  means  that  the  curve  7 (t)  is  such  that 
parallel  transport  along  the  curve  preserves  the  tangent  vector  to  the  curve  [76].  Us¬ 
ing  local  coordinates  on  S,  the  geodesic  equations  are  given  by  the  Euler-Lagrange 
equations  as  [71] 


df2  +  ij  dt  dt  u’ 

*= 1  3= 1 


V  k  e  {1,  ■  •  •  ,  n} 


(6.14) 
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where  0(f)  =  [0i(f), . . . ,  On(t)]T  are  the  coordinates  of  the  curve  7 (f),  T^j ,  i,j  = 
1, . . . ,  n  are  the  Christojfel  symbols  of  the  second  kind  and  are  defined  as  Riemannian 
connection  coefficients. 


rfc  _  1  u  ( dfjii  dgji  _  dg.jj  \ 

13  2  it  \Mj  d6i  dOj 


(6.15) 


where  [gkl]  is  the  inverse  of  the  FIM  G  =  \gu\- 

The  geodesic  equations  in  (6.14)  are  ordinary  differential  equations  for  the  coordi¬ 
nates  0(f).  A  unique  solution  0(f )  can  be  found  for  given  initial  conditions  0(0)  and 
9 ,  which  is  analogous  to  an  initial  position  0(0)  and  the  “speed”  u  £  7~ ^ -S'  i n  the  sense 
of  the  classical  mechanics. 


Assume  that  a  geodesic  is  projected  onto  the  parameter  space  ©  with  a  starting 
point  0(0)  and  a  tangent  vector  v.  The  exponential  map  of  the  starting  point  is  then 
defined  as  [76] 

exp„  [0(0)]  =  \t(l;  0(0),  u).  (6.16) 

where  the  notation  0(0),  v )  is  used  to  signify  a  geodesic  with  a  starting  point 
0(0),  a  tangent  vector  v  and  end  point  0(f). 

It  can  be  shown  that  the  length  along  the  geodesic  between  0(0)  and  SI/  ( 1;  0(0),  v) 
is  \u\  [76,  77].  The  above  concepts  are  appealing  because  a  geodesic  connects  two 
points  in  the  Riemannian  manifold  with  the  minimum  length.  In  classical  mechanics, 
the  geodesics  can  be  thought  of  as  trajectories  of  free  particles  in  a  manifold.  Newton’s 
Laws  allow  one  to  relate  the  position,  velocity,  acceleration  and  various  forces  acting 
on  a  body  and  state  this  relation  as  a  differential  equation  for  the  unknown  position  of 
the  body  as  a  function  of  time.  When  the  motion  of  the  body  is  at  “constant  speed”, 
no  additional  force  is  acting  on  the  body  and  the  trajectory  of  the  body  is  a  geodesic. 
Then,  the  distance  from  0(0)  to  0(f)  along  the  geodesic  is  proportional  to  f,  or  more 
precisely,  is  equal  to  \u\t.  Iterative  application  of  exponential  maps  therefore  forms  an 
approximation  of  flows  along  the  geodesic  and  the  optimization  can  converge  quickly 
[77],  In  general,  obtaining  the  exponential  map  is  a  non-trivial  task.  In  most  cases, 
the  partial  derivatives  of  the  Riemannian  tensor  in  (6.15)  lead  to  a  rather  complicated 
expression  of  T(j  which  prevents  solution  of  the  differential  equations  (6.14). 


6.3.4  Curvatures  and  information 

In  the  mathematical  field  of  differential  geometry,  the  Riemann  curvature  tensor  is  the 
standard  way  to  express  curvature  of  Riemannian  manifolds.  To  each  point  it  associates 
a  tensor  that  measures  the  extent  to  which  the  metric  tensor  is  not  locally  isometric  to  a 
Euclidean  space.  In  local  coordinates,  the  Riemann  curvature  tensor  components  Rlijk 
are  given  by  [78]: 

R\,k  =  -  ^v\3  +  £( rj.r4-fc  -  r^)  (6.17) 

s 

where  [A  are  the  Christoffel  symbols  of  the  second  kind  and  are  given  in  (6.15),  the 
integers  i,j,  k,  l  £  [1,  n]  are  the  indices  of  coordinate  components. 
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Ricci  curvature  tensor  represents  the  amount  by  which  the  shape  of  the  volume 
element  of  a  geodesic  ball  in  a  curved  Riemannian  manifold  deviates  from  that  of  the 
standard  ball  in  Euclidean  space.  As  such,  it  provides  one  way  of  measuring  the  degree 
to  which  the  geometry  determined  by  a  given  Riemannian  metric  might  differ  from  that 
of  ordinary  Euclidean  /(-space.  The  Ricci  curvature  tensor  is  essentially  the  unique  way 
of  contracting  the  Riemann  tensor  [78], 

%  =  E  =  E  ( ^  +  E  (r^  -  (6.18) 

l  l  '  '  l,m 

In  this  report,  the  matrix  form  of  Ricci  curvature  tensor  is  denoted  by  R  =  [R,t]]. 

The  scalar  curvature  (or  Ricci  scalar)  R  is  the  simplest  curvature  invariant  of  a 
Riemannian  manifold  and  is  defined  as  the  trace  of  the  Ricci  curvature, 

R  =  E  9ijRij  (6.19) 

ij 

where  g‘:l  is  the  (/,  j)th  element  of  the  inverse  matrix  of  the  FIM  G(6).  To  each 
point  on  a  Riemannian  manifold,  the  scalar  curvature  assigns  a  single  real  number 
determined  by  the  intrinsic  geometry  of  the  manifold  near  that  point.  Specifically, 
the  scalar  curvature  represents  the  amount  by  which  the  volume  of  a  geodesic  ball  in 
a  curved  Riemannian  manifold  deviates  from  that  of  the  standard  ball  in  Euclidean 
space.  In  two  dimensions,  the  scalar  curvature  completely  characterizes  the  curvature 
of  a  surface  [78],  but  this  fails  in  higher  dimensions. 

In  the  vicinity  of  an  initial  point  at  which  an  arbitrarily  selected  geodesic  7  starts, 
the  Ricci  curvature  describes  the  second  order  rate  of  change  of  the  flux  of  geodesics 
initially  parallel  with  7  [79].  This  means  that  the  Ricci  curvature  measures  how  the 
fluxes  of  initially  parallel  geodesics  change  in  a  given  direction  of  interest,  and  there¬ 
fore,  it  provides  a  measure  of  how  well  the  neighboring  fibers  of  geodesics  stick  to¬ 
gether  along  their  direction  of  elongation.  As  a  special  case,  the  fluxes  of  a  bundle  of 
parallel  geodesics  (straight  lines)  in  Euclidean  space  have  no  change  along  their  elon¬ 
gation.  The  behavior  of  a  collection  of  geodesics  reflects  the  value  of  the  curvature 
as  well  as  the  structure  of  the  manifold.  Non-negative  Ricci  curvature  implies  stabil¬ 
ity  and  a  relatively  stable  bundle  of  geodesics,  while  on  the  other  hand  negative  Ricci 
curvature  implies  a  less  coherent  bundle  of  geodesics. 


Geometrically,  the  Ricci  curvature  tensor  is  the  mathematical  object  that  controls 
the  growth  rate  of  the  volume  of  metric  balls  in  a  Riemannian  manifold.  The  evolution 
of  volumes  under  the  geodesics  with  parallel  initial  tangent  vectors  near  a  point  in 
the  manifold  is  controlled  by  the  Ricci  curvature  [80].  On  the  other  hand,  the  Ricci 
curvature  represents  the  amount  by  which  the  volume  element  of  a  geodesic  ball  in  a 
manifold  deviates  in  shape  from  that  of  the  standard  ball  in  Euclidean  space.  As  such, 
it  provides  one  way  of  measuring  the  degree  to  which  the  geometry  determined  by  a 
given  Riemannian  metric  might  differ  from  that  of  ordinary  Euclidean  /(-space.  Near 
any  point  6  in  a  Riemannian  manifold  ( S ,  g),  the  infinitesimal  volume  element  d/tg  in 
local  normal  coordinates  has  the  following  expansion  at  6  [81,  82]: 


dg,g 


V\G(6)\  d^Euci  idean 

l~-GTR0  +  O(\0f 
6 


d  /I  Euclidean 


(6.20) 

(6.21) 
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where  dfj, Euclidean  denotes  the  infinitesimal  volume  element  in  the  parameter  space  (a 
Euclidean  space  here),  R  is  the  matrix  of  Ricci  curvature  tensor  given  by  Eq.  (6.18) 
and  A  denotes  the  determinant  of  matrix  A. 

Eq.  (6.21)  signifies  that  if  the  Ricci  curvature  is  negative  at  G  in  the  manifold,  the 
unit  geodesic  ball  will  have  larger  volume  than  it  would  in  Euclidean  space.  In  the 
statistical  manifolds  of  sensor  networks,  the  value  of  the  determinant  of  the  Fisher  in¬ 
formation  matrix  \G(0)  |  represents  the  amount  of  information  that  can  be  acquired  by 
the  underlying  sensor  system  [83].  Therefore  in  this  report,  the  Ricci  scalar  curvature 
specifically  indicates  the  amount  of  information  which  can  be  collected  by  the  sensor 
networks. 

The  above  curvatures  are  defined  using  the  canonical  Levi-Civita  connection  (6.15) 
which  is  a  natural  connection  compatible  with  the  Fisher  information  metric.  In  fact, 
there  are  a  variety  of  kinds  of  connections.  In  1972,  Chentsov  [47]  introduced  a  one- 
parameter  family  of  affine  connections  called  o-connections  which  were  later  popular¬ 
ized  by  Amari  [84].  In  particular,  a  =  0  corresponds  to  the  Levi-Civita  connection  (in¬ 
formation  connection),  a  =  I  defines  the  e-connection  (exponential  connection)  while 
a  =  —  1  defines  the  m-connection  (mixture  connection).  The  curvatures  correspond¬ 
ing  to  these  three  connections  are  the  Riemann  curvature,  e-curvature  and  ?n-curvature, 
respectively.  All  of  these  curvatures  are  intrinsic. 

Many  statisticians  have  attempted  to  show  the  relationship  between  curvature  and 
statistics  in  the  literature.  The  work  pioneered  by  Efron  [48]  in  1975  introduced  the 
concept  of  statistical  curvature  and  described  the  basic  role  of  the  curvature  in  the 
high-order  asymptotic  theory  of  statistical  inference.  He  proved  that  the  second-order 
information  loss  of  a  first-order  efficient  estimator  is  related  to  the  statistical  curva¬ 
ture  of  a  curve  representing  a  one-parameter  family  of  distributions.  Dawid  [85]  and 
Madsen  [86]  succeeded  in  extending  the  result  of  Efron  to  the  multi-parameter  case 
while  Amari  unified  all  of  these  theories  in  a  differential-geometrical  framework  in 
[84],  In  this  work,  it  was  shown  that  the  second-order  information  loss  of  a  general 
Fisher  efficient  estimator  can  be  decomposed  into  the  sum  of  two  non-negative  terms. 
One  is  related  to  the  e-curvature  of  the  statistical  model  and  the  other  is  related  to  the 
m-curvature  of  the  ancillary  subspace  associated  with  the  estimator. 

Nevertheless,  in  this  report,  we  mainly  indicate  the  relationship  between  the  Rie- 
mann/Ricci  curvature  (a  =  0)  and  the  amount  of  information  which  can  be  collected  by 
the  sensor  networks.  This  is  demonstrated  in  three  scenarios  in  the  following  section. 


6.4  Application  Examples  in  sensor  networks 


In  this  Section,  the  analysis  of  sensor  networks  for  target  tracking  via  statistical  man¬ 
ifold  techniques  and  information  geometry  is  demonstrated  using  the  following  three 
2D  sensor  network  examples: 


Example  1: 
Example  2: 
Example  3: 


Sensor  network  contains  a  single  range-bearing  sensor; 
Sensor  network  contains  two  bearings-only  sensors; 
Sensor  network  involves  three  range-only  sensors. 
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In  these  examples,  the  state  of  a  target  is  represented  by  the  2D  target  location, 
i.e.,  0  =  [0i,  02]t  =  [x,y]T.  The  IFID  between  two  targets  (i.e.,  the  geodesic  length 
between  probability  distributions  in  the  statistical  manifolds)  is  calculated  and  is  com¬ 
pared  with  the  corresponding  KLD.  Using  IFID  as  a  measure  of  the  resolvability  of 
closely  spaced  targets  is  illustrated.  In  addition,  the  canonical  Levi-Civita  affine  con¬ 
nection  as  well  as  Riemannian  and  scalar  curvatures  are  computed  to  elucidate  the 
structure  of  the  statistical  manifold.  The  relationship  between  the  Ricci  curvature  ten¬ 
sor  field  and  the  maximum  amount  of  information  that  can  be  obtained  by  the  network 
sensors  is  discussed. 


6.4.1  Geodesics  and  fisher  information  distance 

In  (6.2),  it  is  assumed  that  the  measurement  vector  x  of  the  underlying  sensor  networks 
obeys  a  multivariate  normal  distribution  which  belongs  to  a  class  of  exponential  family 
distributions4, 

x\0~AT(vi(e),C{e)).  (6.22) 

Therefore,  the  likelihood  of  the  measurement  x  is  given  by 

p(x \0)  =  \2-kC{9)\~1^2  exp  {  —  [x  —  n(0)]T  C_1  (9)[x  —  /r(0)]/2}.  (6.23) 


As  shown  in  [67],  the  Fisher  information  matrix  of  this  type  of  density  with  respect 
to  Q  is  of  the  form 


dtm 

d0i 


C~\0) 


d»(e) 


1 

2tt 


C~\G) 


dC{0) 

dOi 


C~\0) 


dc{ey 

d6j 

(6.24) 


A  single  range-bearing  sensor  network  -  Example  1 


In  this  example,  the  sensor  observes  both  range  and  bearing  of  a  target,  hence  the 
measurement  model  (6.2)  is  written  as 


x  = 


r 

<t> 


y/x2  +  y2 

-L 

Wr 

arctan  (y/x) 

.  w4> 

(6.25) 


where  r  and  (j>  signify  the  target  range  and  bearing,  respectively,  relative  to  a  sensor 
placed  at  the  origin  of  the  coordinate  frame  and  the  zero-mean  additive  noise  w  = 
[wr,  wy7  has  covariance  C(9).  In  view  of  (6.22)  and  (6.25),  we  have 


\/x2  +  y2 
arctan(^)  ’ 


C(0) 


rAaf,  0 
0  a2 


(6.26) 


where  the  term  r4  in  the  diagonal  of  range  component  of  C(6)  is  used  to  model  the 
effect  that  the  amplitude  of  radar  echo  signal  attenuates  according  to  the  fourth  power 
of  the  target  range;  ay  and  ay  are  the  standard  deviations  of  range  and  bearing  mea¬ 
surement  noise,  respectively. 

4Without  confusion,  subscripts  for  the  temporal  aspect  of  parameter  0  are  dropped  from  the  notation. 
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Using  (6.24),  we  observe  that  the  squared  differential  FID  of  Eq.  (6.6)  can  be 
calculated  as. 


ds 2  =  E  de*dei  = 

i,j=  1 

,,2 


( x 2  +  y 2)2 


2  2 
£  V  r,  9 

-j-  — 4-  8x 


(a;2  4-  f/2)cr2  cr2 


dx2 


4- 


4- 


1 


(x2  +  y2)2 
2 

(x2  4-  y2)2 


y 


O’2  +  y2)a2  a2 


4 - 2  +  % 


(x2  +  y2)cr2  a2 


xy 

-  4-  8xy 


dy2 


dxdy 


(6.27) 


The  IFID  integral  in  (6.8)  is  obtained  by  calculating  the  geodesic  that  connects  the 
locations  of  the  two  targets.  This  can  be  done  by  solving  the  Euler-Lagrange  equations 
(6.14).  In  this  example,  the  equations  are  of  the  following  form: 

d2x  ,  ri  / dx \ 2  ,  apt  dxdy  1  ,dy  2_ 

~dT  +  lll{M}  +2ll2MM+L22[Tt’ 
d2y  ,dx,9  9  dxdy  ,dy,9  „ 

+  r?l(— )-  +  2rf,— J  +  rl2(  J)-  =  o  (6.28) 

where  the  Christoffel  symbols  of  the  second  kind  are  (for  a2  =  1,  er2  =  0.04) 

rn  =  -  (8(x2  +  y2)2  +  2x2  +  y2)  x/a,  T^  =  (8(x2  4-  y2)2  +  y2)  y/a, 
r}2  =  =  rii|,  r22  =  r2,  =  -  (8(x2  +  y2)2  +  x2  +  2y2)  x/a, 

r22  =  (8  (x2  +  y2)2+x2)x/a,  r22=T22^  (6.29) 

with  a  =  (x2  4-  y2)2(8x2  +  8 y2  +  1). 


The  equations  in  (6.28)  are  second  order  nonlinear  differential  equations  and  can 
only  be  solved  numerically.  Each  pair  of  initial  conditions  (a  starting  point  0(0)  and  a 
tangent  vector  v>)  corresponds  to  a  unique  solution  of  the  geodesic.  By  setting  initial 
conditions  in  the  exponential  map  in  (6.16),  where  0(0)  corresponds  a  location  and 
u{ip)  =  [cos^,  sin<^]T,  0  <  ip  <  2tt  is  an  unit  tangent  vector,  the  0(f)  of  a  geodesic 
can  be  found  by  solving  (6.28).  A  map  of  geodesics  of  identical  lengths,  that  is,  a  circle 
in  the  IFID  metric  is  sketched  in  Fig.  6.2.  Such  a  map  illustrates  differences  between 
the  Euclidean  distance  and  the  information  distance  between  two  targets. 


Fig.  6.3  (a)  shows  the  map  of  geodesics  for  the  sensor  network  of  Example  1. 
Shown  are  the  set  of  geodesics  {d/(T;  0(0),  ip  =  27rfc/64,  k  =  0, . . . ,  63 } , 

with  initial  location  0(0)  =  [10,  10] 2  ,  unit  speed  |i/|  =  1  and  end  time  T  =  20.  The 
end  points  form  a  manifold  “circle”  with  radius  \v\T,  which  is  the  IFID  along  these 
geodesics.  For  a  comparison,  we  also  plot  the  “circles”  formed  by  using  the  KLD  as  a 
measure  of  distance  in  Fig.  6.3  (b).  The  two  “circles”  are  quite  different  in  this  case.  It 
is  important  to  note  that,  as  illustrated  in  Fig.  6.3(d),  the  “equidistant”  points  measured 
by  the  IFID  will  generally  not  correspond  to  the  same  points  measured  in  the  target 
state  (Euclidean)  space.  In  Fig.  6.3(c),  the  “circle”  of  ED  is  plotted.  This  plot  shows 
that:  1)  under  the  Riemannian  metric,  a  free  particle  will  generally  travel  non-equal 
distances  along  different  geodesics  to  accumulate  the  same  amount  of  energy,  and  2) 
the  shape  of  the  ED  circle  is  in  between  that  of  IFID  and  KLD. 
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Figure  6.2:  Illustration  of  geodesics  that  start  from  the  same  locations  with  initial  unit 
tangent  vectors. 


Sensor  network  involving  two  bearings-only  sensors  -  Example  2 

In  this  example,  we  consider  the  problem  of  target  localization  using  bearings-only 
measurements  from  two  passive  sensors.  As  shown  in  Fig.  6.4,  the  two  passive  sen¬ 
sors  are  located  at  ( rji ,  £j),  i  =  1,2  and  observe  the  bearings  of  a  target  subject  to  a 
Gaussian  zero-mean  random  noise.  The  measurements  satisfy  (6.22)  with  the  mean 
and  covariance  matrix  given  by 


m(0) 


arctan 

1  1 
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arctan 

1  1 
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to  to 

C(6) 


(6.30) 


where  0  =  \x,  y]T  is  the  target  state  vector  and  a (j>1  =  0.2  and  a^,2  =  0.2  are  the 
standard  deviations  of  the  measurement  noise  for  sensors  1  and  2,  respectively. 


The  FIM  in  this  case  is  derived  as 
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(6.33) 

Fig.  6.5(a)  shows  the  geodesics  0(0),  i^)},  where  t  £  [0,  12],  0(0)  = 

[15,  15]t  and  v  =  [cosy;,  siny;]2  .  The  two  passive  sensors  are  located  at  [rji,  ^ )  = 
(0,  0)  and  (772,  £2)  =  (50,  10).  The  IFIDs  along  all  these  geodesics  are  equal  to 
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Figure  6.3:  Figures  of  example  1:  (a)  FID  circle;  (b)  KLD  circle;  (c)  ED  circle.  All 
circles  were  drawn  in  the  target  state  space  0.  (d)  The  lengths  of  IFID  on  the  statistical 
manifold  which  corresponds  to  a  circle  centered  at  0(0)  =  [10, 10] 1  in  the  target  state 
(Euclidean)  space. 
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Figure  6.4:  The  measurement  model  of  bearings-only  measurements  of  two  sensors. 

\u\ T  =  12.  The  set  of  end  points  of  geodesics  describes  a  manifold  circle  of  identical 
FIDs  centered  at  the  target  location  0(0).  For  comparison,  the  KLD  circle  that  centered 
at  0(0)  is  also  drawn  in  Fig.  6.5(c).  The  plots  are  repeated  at  a  different  initial  target 
state  0(0)  =  [40,  25]T  in  Figs.  6.5(b)  and  6.5(d).  These  plots  highlight  the  different 
responses  of  the  IFID  and  KLD  to  changes  in  the  target  location  in  this  sensor  network 
scenario.  The  ED  circles  for  the  above  two  cases  are  plotted  in  Fig.  6.5(e)  and  6.5(f). 


Sensor  network  involving  three  ranges-only  sensors  -  Example  3 


In  this  example,  we  consider  an  extended  example  of  the  target  localization  problem 
in  a  sensor  network  of  three  range-only  sensors.  As  shown  in  Fig.  6.6,  these  three 
range-only  sensors  are  located  at  ( rji ,  <4, ) ,  i  =  1,  2, 3  and  observe  the  ranges  between  a 
target  and  the  sensors  subject  to  a  random  range  noise.  In  this  network  configuration, 
the  likelihood  function  is  described  by  (6.22)  with 
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The  FIM  for  the  measurement  model  (6.34)  can  be  derived  as 
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As  in  previous  examples,  the  manifold  circles  of  identical  IFIDs  and  KLDs  are  plot¬ 
ted  in  Fig.  6.7(a)  and  6.7(c),  where  the  geodesics  start  from  at  0(0)  =  [20,  10] J  and 
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Figure  6.5:  Figures  of  Example  2:  (a),  (c)  and  (e)  are  respectively  the  IFID,  KLD  and 
ED  circles  at  0(0)  =  [15,  15]2  drawn  in  ©  where  the  two  passive  sensors  are  located 
at  (t?i)  £i)  =  (0,  0)  and  (772,  £2)  =  (50,  10),  respectively,  (b),  (d)  and  (f)  are  the 
replicated  plots  of  (a),  (c)  and  (e)  respectively  at  0(0)  =  [40,  25]T. 
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Figure  6.6:  Example  of  sensor  network  of  three  range  only  sensors  for  target  localisa¬ 
tion 


end  with  identical  lengths  of  \v\ T  =  15.  The  plot  is  repeated  in  Fig.  6.7(b)  and  6.7(d), 
where  all  geodesics  start  from  0(0)  =  [—10,  10]T  with  identical  lengths  of  \v\ T  =  10. 
In  this  example,  the  three  sensors  are  located  at  (771,  £1)  =  (0,  0),  (772,  £2)  =  (15,  30) 
and  (773,  £3)  =  (50,  10)  with  the  standard  deviations  of  noise  ari  =  1,  oy2  =  2  and 
oy3  =  3,  respectively.  The  ED  circles  centered  at  the  aforementioned  two  locations  are 
given  in  Fig.  6.7(e)  and  6.7(f)  respectively. 

It  is  interesting  to  verify  the  behavior  of  the  geodesics  in  a  large  scale  in  this  ex¬ 
ample.  Fig.  6.8  shows  two  geodesics  { (i;  0(0),!^)},  i  =  1,2,  0  <  t  <  150, 
which  start  from  the  same  location  0(0)  =  [—10,  10]T  and  have  initial  tangent  vectors 
Vi  =  [cos  7^,  siny>i]T  with  ipi  =  0  and  ip2  =  7t/2.  Clearly,  as  illustrated  in  Fig.  6.8, 
the  geodesic  between  two  points  A  and  B  in  a  Riemannian  manifold  is  not  unique. 
In  this,  and  every  case,  the  IFID  between  two  points  corresponds  to  the  geodesic  of 
shortest  length. 

Remarks: 

1.  As  demonstrated  in  Figs.  6.3,  6.5,  and  6.7,  an  IFID  “circle”  of  a  statistical  man¬ 
ifold  generally  does  not  correspond  to  a  circle  in  Euclidean  (target  state)  space 
and  vice  versa. 

2.  Geodesics  between  two  points  in  a  statistical  manifold  are  not  unique;  the  IFID 
between  two  points  is  the  length  of  the  shortest  geodesic. 

3.  One  of  the  main  differences  between  the  IFID  and  the  KLD  is  that  the  IFID 
measures  the  distance  (i.e.,  the  length  of  the  shortest  geodesic)  between  two 
points  on  a  statistical  manifold  and  it  is  a  genuine  metric  while  the  KLD  is  not. 

4.  IFID  may  be  used  to  measure  the  underlying  sensor  ability  to  resolve  closely 
spaced  targets.  In  practice,  a  threshold  of  the  minimum  IFID  required  for  sepa¬ 
rating  two  closely  spaced  targets  may  be  set.  Two  closely  spaced  targets  cannot 
be  resolved  from  a  measurement  if  the  IFID  between  them  is  below  the  thresh¬ 
old.  This  concept  is  illustrated  in  Fig.  6.9  in  terms  of  a  “resolution  cell”  in  the 
sensor  network  of  Example  3.  All  edges  of  the  colored  areas  represent  identical 
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Figure  6.7:  Figures  of  Example  3:  (a),  (c)  and  (e)  are  the  circles  of  FID,  KLD  and  ED 
respectively  drawn  in  0  at  0(0)  =  [20,  10]T.  (b),  (d)  and  (f)  are  the  replicated  plots  of 
(a),  (c)  and  (e)  respectively  at  another  location  0(0)  =  [—10,  10]T.  Three  sensors  are 
located  at  (771,  £1)  =  (0,  0),  (772,  £2)  =  (15,  30)  and  (773,  £3)  =  (50,  10). 
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Figure  6.8:  Illustration  of  two  geodesics  in  Example  3  which  start  from  the  same  loca¬ 
tion  in  different  directions  and  are  of  the  same  length. 


IFIDs  between  two  the  targets  with  one  at  the  center  and  another  at  the  edge  of  a 
colored  area. 


x(m) 


Figure  6.9:  Illustration  of  FID  circles  which  may  serve  as  the  “resolution  cells”  for 
the  underlying  sensor  network  to  measure  its  ability  to  distinguish  two  closely  spaced 
targets. 


6.4.2  Riemannian  and  scalar  curvatures 

Figure  6.10  depicts  the  Ricci  scalar  curvatures  of  the  statistical  manifold  in  the  sensor 
network  of  three  range-only  sensors  (i.e..  Example  3).  In  this  example,  the  Ricci  scalar 
curvatures  of  the  statistical  manifold  are  less  than  or  equal  to  zero.  The  plot  provides 
an  additional  graphical  view  of  properties  of  the  underlying  sensor  network.  For  exam¬ 
ple,  it  reflects  the  rate  of  change  of  information  which  can  be  collected  by  the  sensor 
network  at  a  particular  point. 
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Figure  6.10:  Ricci  scalar  curvatures  of  the  statistical  manifold  for  the  sensor  network 
of  Example  3.  (a)  Ricci  scalar  curvature  distribution,  (b)  Colour  map  of  the  scalar 
curvature. 


6.4.3  Ricci  curvature  tensor  field 


(a)  Case  1  (b)  Case  2 

Figure  6.1 1:  Bundle  of  geodesics  of  identical  FIDs  with  parallel  initial  tangent  vectors 
for  the  sensor  network  of  three  ranges-only  sensors.  Case  1  illustrates  the  situation  of 
divergent  geodesic  bundle  and  a  convergent  bundle  example  is  shown  in  Case  2. 


Figure  6.1 1  shows  a  bundle  of  geodesics  of  identical  FIDs  with  parallel  initial  tan¬ 
gent  vectors  in  the  sensor  network  of  Example  3.  Clearly,  the  bundle  of  geodesics  in 
Case  1  deviate  from  each  other  in  the  target  state  plane  whereas  the  bundle  of  geodesics 
in  Case  2  are  of  another  type  of  deviation,  i.e.,  they  deviate  from  each  other  in  a  direc¬ 
tion  that  is  perpendicular  to  the  plot  plane  in  some  region.  This  behavior  of  a  bundle 
of  initially  parallel  geodesics  indicates  the  structure  of  the  manifold  corresponding  to 
an  underlying  sensor  network.  It  is  related  to  the  information  change  rate  of  the  sensor 
network,  which  can  be  measured  by  the  Ricci  curvature. 

Figure  6.12  represents  the  Ricci  curvature  tensor  field  of  the  statistical  manifold  of 
Example  3.  Taking  sign  into  account,  the  Ricci  curvature  tensor  field  can  be  regarded  as 
information  ellipses  which  indicate  both  the  amplitude  and  direction  of  the  information 
change  rate  observable  to  the  network.  This  information  is  important  for  target  tracking 
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Figure  6.12:  Ricci  curvature  tensor  field  of  the  manifold  with  range-only  measurements 
from  three  sensors. 


in  the  sensor  network.  In  Figure  6.13,  the  surface  of  the  determinant  of  the  FIM, 
which  serves  as  the  metric  in  the  statistical  manifold  of  Example  3,  is  plotted  on  a 
logarithm  scale.  It  illustrates  the  amount  of  information  that  can  be  acquired  by  the 
sensor  network  when  the  target  is  in  a  particular  position. 


Figure  6.13:  The  surface  of  the  determinant  of  the  FIM  in  Example  3  plotted  on  a 
logarithm  scale. 


Discussions: 

1 .  As  demonstrated  in  Figure  6.10,  the  Ricci  scalar  curvatures  of  the  statistical  man¬ 
ifold  in  Example  3  are  negative.  Interestingly,  the  scalar  curvatures  of  the  sta¬ 
tistical  manifold  of  the  other  two  examples  vanish  everywhere.  A  manifold  of 
zero  Ricci  curvature  is  called  a  Ricci-flat  manifold  and  it  indicates  the  manifold 
is  locally  flat  everywhere,  i.e.,  the  geodesic  ball  of  the  manifold  is  geometri¬ 
cally  identical  to  the  standard  ball  in  Euclidean  space  locally.  This  can  also  be 
seen  from  Equation  (6.21),  where  the  vanishing  of  Ricci  curvature  for  a  sensor 
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network  reflects  the  fact  that  the  Fisher  information  volume  near  a  point  on  the 
statistical  manifold  has  the  same  geometric  shape  as  that  described  in  Euclidean 
space. 

2.  Examples  2  and  3  may  be  generalised  to  a  ./V-sensor  case  (i.e.,  N  =  2, 3,  •  •  ■  for 
Example  2  and  N  =  3, 4, ...  for  Example  3).  Our  calculations  indicate  that  the 
Ricci  scalar  curvatures  can  be  positive  when  N  >  3  for  Example  3.  The  statisti¬ 
cal  manifold  is  no  longer  Ricci  flat  in  the  case  of  N  >  2  sensors  as  presented  in 
Example  2.  These  observations  suggest  a  strategic  optimisation  method  which  is 
based  on  the  required  values  of  Ricci  scalar  curvature  for  the  sensor  placement 
problem  in  sensor  network  design.  This  topic  is  currently  being  investigated  as 
continuing  research  work. 

3.  As  demonstrated  in  Fig.  6.12,  the  Ricci  curvature  tensor  provides  the  informa¬ 
tion  change  rate  along  a  given  direction  on  the  manifold  with  zero  Ricci  curva¬ 
ture  reflecting  an  isotropic  change  rate.  In  addition,  it  indicates  the  amount  of 
information  able  to  be  collected  by  the  underlying  sensor  network. 

4.  In  many  sensor  network  optimisation  problems,  such  as  sensor  scheduling,  radar 
waveform  design  etc.,  the  amount  of  information  collected  by  the  network  sen¬ 
sors  serves  as  an  important  criterion.  From  the  information  geometry  point  of 
view,  the  optimisation  is  achieved  by  changing  the  associated  statistical  manifold 
structure,  which  is  described  by  Riemannian  curvatures.  In  [67]  the  potential  of 
optimal  sensor  scheduling  via  the  information  geometry  has  been  demonstrated. 


6.5  Statistical  Manifold  Representation 


The  shape  of  statistical  manifolds  of  higher  dimensions  is  difficult  to  describe  in  terms 
of  spaces  that  we  are  familiar  with.  A  reasonable  way  is  to  seek  a  representation  of  the 
manifold  as  an  immersion  in  the  Euclidean  space  R",  which  will  preserve  the  differ¬ 
ential  structure  of  the  original  manifold  and  will  have  a  derivative  which  is  everywhere 
injective.  The  affine  immersion  discussed  by  Dodson  and  Matsuzoe  in  [87]  provides 
such  a  framework  in  which  the  underlying  statistical  manifold  of  sensor  networks  can 
be  realised  in  R”.  In  this  section,  we  will  discuss  the  general  affine  immersion  for  the 
multivariate  Gaussian  manifold  corresponding  to  our  sensor  network  examples. 


6.5.1  Exponential  family  of  probability  density  functions 

As  we  mentioned  earlier,  the  measurement  of  the  sensors  we  discussed  obeys  a  mul¬ 
tivariate  normal  distribution  which  belongs  to  the  exponential  family  of  distributions. 
The  exponential  family,  which  includes  popular  distributions  such  as  the  Gaussian, 
Poisson  and  Gamma,  is  widely  used  in  probability  and  statistics  due  to  its  many  desir¬ 
able  properties  [69], 

An  ?i-dimensional  set  of  probability  density  functions  S  =  {po \0  €  0  C  R"}  is 
said  to  be  an  exponential  family  when  the  density  functions  can  be  expressed  in  terms 
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(6.38) 


of  functions  {C,  F\,  ■  ■  ■  ,  Fn}  and  a  function  p  on  ©  such  that  [88]: 

pe(x)  =  exp|c,(at)  +  '^2(6iFi(x))  -  <p(0)  j 

i 

where  x  £  is  a  vector  valued  measurement,  G  =  [0\,  ■  ■  ■  ,  dn]T  are  the  natural  co¬ 
ordinates  or  canonical  parameters,  C(x)  represents  the  exponential  component  which 
is  independent  of  G  and  F(x)  =  [F^x),  •  •  •  ,  Fn{x)]T  are  sufficient  statistics  for  G. 
The  function  ip  is  called  the  potential  function  of  the  exponential  family  and  it  is  found 
from  the  normalisation  condition  JnpQ(x)dx  =  1,  i.e., 

<p(G)  =  log  j  exp|c(a:)  +  'y'(6iFi(x))}dx  (6.39) 

From  the  definition  of  an  exponential  family,  and  with  0,  =  d/ddi,  we  use  the 
log-likelihood  function  1{G ,  x)  =  log (pe(x))  to  obtain 

dil{G ,  x)  =  Fi{x)  -  diip(G)  (6.40) 

and 

didjl(G,  x)  =  —didjip(G)  (6.41) 

The  Fisher  information  metric  G  on  the  /(-dimensional  space  of  parameters  0  C 
R",  equivalently  on  the  set  S  =  {pe\G  e  ©  C  R”},  has  coordinates  [89]: 

tjij  =  -  [  didj  1{G,  x)pg(x)dx  =  didjp(G)  (6.42) 

Jn 

Then,  (S',  g)  is  a  Riemannian  //-manifold  with  Levi-Civita  connection  given  by 
[89]: 

n 

rtj-W  =  X  9 9k\di9ji  +  dj9ii  -  di9ij) 

1=1  ^ 
n 

=  5]  -y^^dMG).  (6.43) 

where  [ gkl ]  represents  the  inverse  of  [gu]- 

The  justification  of  the  underlying  statistical  manifold  representation  by  immersion 
(in  the  natural  parameter  G)  in  an  exponential  family  can  be  viewed  from  the  following 
two  points: 

1.  Any  exponential  family  of  distributions  has  a  unique  a  potential  function  and 
the  latter  completely  describes  this  exponential  family  of  distributions  [90].  In 
other  words,  we  can  fully  understand  the  mean  and  covariance  of  the  sufficient 
statistics  F.^x),  i  =  1,  •  •  •  ,  n  in  (6.38)  by  differentiating  p(G),  i.e., 

E{Fi{x)}  =  r)  =  di.p(G)  (6.44) 

E{  [. Fi(x )  -  E{Fz(x))\  [. Fj(x )  -  E(FJ(x))]T}  =  didjip{0)  (6.45) 

2.  The  Fisher  information  matrix  (6.42)  and  thus  the  connection  (Christoffel  sym¬ 
bols)  of  (6.43)  corresponding  to  the  statistical  manifold  can  all  be  described  using 
the  potential  function 
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6.5.2  Affine  immersion  of  a  manifold 


Let  M  be  an  n-dimensional  manifold,  and  /  an  immersion  from  M  to  R"+1,  i.e.,  the 
differential  map 

Dpf  :  TPM  =►  Tfip)M.n+1  (6.46) 

is  an  injective  map  in  the  tangent  space,  denoted  as  Tp ,  at  every  point  p  £  M.  Suppose 
that  £  is  a  local  vector  field  along  f.  The  pair  {/,  £}  is  said  to  be  an  affine  immersion 
from  M  to  R"+1  if,  for  each  point  p  £  M,  the  following  formula  holds  [90]: 

Tf(p)nn+1  =  /*  (• TpM )  0  Spantip}  (6.47) 

where  ffi /(p)^-"+1  can  be  identified  with  Rn+1,  V  f(p)  £  7Zn+1,  ®  denotes  the  direct 
sum  of  two  subspaces  and  Span  is  the  span  operator  on  a  collection  of  vectors  in  linear 
algebra.  We  call  £  a  transversal  vector  field.  Eq.  (6.47)  is  a  technical  requirement  to 
ensure  that  the  differential  structure  is  preserved  into  the  immersion. 

For  manifolds  of  multivariate  Gaussian  distributions,  which  is  the  case  of  our  sensor 
network  scenarios,  the  representation  of  manifold  can  be  realized  in  Euclidean  space 
R"+1  by  the  following  affine  immersion: 

Proposition  6.5.1.  Let  M  be  the  multivariate  Gaussian  manifold  with  the  Fisher  in¬ 
formation  metric  g.  Denote  by  ( 6 ,  S)  a  natural  coordinate  system.  Then  M  can  be 
realized  in  R"+1  by  the  graph  of  a  potential  function,  namely,  M  can  be  realized  by 
the  affine  immersion  {/,  £}: 


/  :  M  M"+1 


‘  e  " 

1 

[I]  05 

,  f  = 

1 

o  o 

1 _ 

-  “  - 

.  V  . 

L 1 J 

where  p  is  the  potential  function. 


(6.48) 


We  give  a  simple  ID  example  in  the  next  sub-section  to  demonstrate  how  to  find 
the  potential  function  from  a  given  distribution. 


6.5.3  An  example  of  a  gaussian  2-manifold 

The  family  of  univariate  normal  or  Gaussian  density  functions  has  an  event  space  f l  = 
R  and  the  probability  density  functions  are  given  by 

M  =  \p{x\  /i,  <t2)| p(x;  p,  cr2)  =  . —  e  ,  p  £  R,  er  £  R+  1  (6.49) 

[  v27Tcr  J 

The  mean  p  and  standard  deviation  a  are  frequently  used  as  a  local  coordinate 
system  (£i,  £2)  =  {p,  er)  as  in  [90]. 
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The  univariate  Gaussian  density  can  be  written  as 


p(x;  01,62)  =  exp<j  ^ x- 


1 


2  cr2‘ 

=  exp  ^9±x  +  02x2  — 

=  exp  {C(x)  +  Fi{x)0i  +  F2(x)9 2  -  tp{0 1,  02)} 


0\  1  ,  7T 

- - — I —  loe( - ) 

402  2  02 


(6.50) 


In  the  Gaussian  2-manifold,  set  0 1  =  -jpz  and  02  =  —  Then  (0i,  02)  = 
(^,  — is  the  natural  coordinate  system  and 

v  =  -w2  +  \'og{-:l)  =  h  +  kti'/^a)  (6M> 

is  the  corresponding  potential  function. 


For  a  natural  coordinate  system,  there  exists  a  function  <p  (potential  function)  on  M 
such  that  the  Fisher  information  metric  is  given  by  the  Hessian  of  ip,  that  is 


d2ip 

dOidOj  ~~  9ij 


(6.52) 


where  [g-ij]  is  the  Fisher  information  metric  with  respect  to  the  natural  coordinate  sys¬ 
tem. 


According  to  (6.52),  the  Fisher  metric  of  the  Gaussian  2-manifold  with  respect  to 
natural  coordinates  (0i,  02)  is  given  by 


[9ij] 


-1  0  1 

202  2 el 

01  02-0? 
20J  ^0|“ 


a 2  2 /.ter2 

2 /ter2  2cr2(2/t2  +  er2) 


(6.53) 


6.5.4  Manifold  representations  of  sensor  networks 

For  the  sensor  network  examples  discussed  in  this  report,  the  measurement  errors  are 
characterized  with  multivariate  Gaussian  distributions,  i.e., 

x  ~  A/"(/i(£)>  C(O)  (6.54) 

where  £  =  y]T  G  0  is  the  state  of  interest  in  the  local  coordinate  system. 

The  probability  density  function  of  the  measurement  error  is 

p(x;  £)  =  |27rC(|)|^1/2exp{-[a;  -  /r(|)]TC_1(|)[a;  -  /i(£)]/ 2}  (6.55) 

It  can  be  represented  in  exponential  form  using  the  collection  of  sufficient  statistics 
(x,  xxT).  Let  6  be  a  m-vector  of  parameters  associated  with  the  vector  of  sufficient 
statistics  x  =  [xi,  ■  ■  ■  ,  xrn]T,  where  to  is  the  dimension  of  x ,  and  a  symmetric  matrix 
S  <5  Rmxm  associated  with  the  matrix  xxT .  Then  the  multivariate  Gaussian  is  an 
exponential  family  of  the  form  [91]: 

p{x\  6)  =  exp  {<  0,  x  >  +  <<  3,  xxT  »  —<p(0,  3)}  (6.56) 
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where  <  Q,  x  >=  Y^L  1  ®ixi 's  the  Euclidean  inner  product  on  Rm,  and 

m  m 

«  3,  xxT  >>=  tr(axxT)  =  EE  (6.57) 

*= 1  3  = 1 


is  the  Frobenius  inner  product  on  symmetric  matrices. 

We  may  write  the  natural  parameters  of  mixed-type  ( 6 ,  S)  =  (C_1/x,  —  ^C_1) 
with  the  corresponding  potential  function  [92] 

3)  =  — *tr(3~100T)  -  ^  log  |  -  3|  +  ^  logTr  (6.58) 


Figure  6.14:  Affine  immersion  for  the  manifold  of  single  conventional  radar  network 
in  Example  1 . 


The  potential  function  <p{9,  3)  is  a  strictly  convex  and  differentiable  function  that 
specifies  uniquely  the  exponential  family.  The  one-to-one  mapping  from  the  original 
parameters  (/x,  C)  to  natural  parameters  {6,  3)  is  given  by 


A4 

C 


i/^-i 


(6.59) 


Therefore,  the  potential  function  can  be  expressed  in  terms  of  local  parameters  as  fol¬ 
lows: 

s)  =  ^  log \C\  +  ™  log  2tt.  (6.60) 


Figures  6.14-6.16  show  affine  immersions  in  R3  for  the  manifolds  of  the  three 
basic  forms  of  sensor  network  examples  discussed  previously.  Intuitively,  these  plots 
indicate  the  maximum  relative  error  of  the  parameters  in  the  applications  to  sensor 
networks. 


Discussions: 
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Figure  6.15:  Affine  immersion  for  the  manifold  of  two  bearings-only  sensor  network 
in  Example  2. 


1.  For  distributions  of  the  exponential  family,  a  statistical  manifold  can  be  repre¬ 
sented  by  its  potential  function  in  natural  parameters  via  an  immersion  which 
preserves  the  differential  structure  of  the  statistical  manifold.  In  our  examples  as 
illustrated  in  Figures  6.14,  6.15  and  6.16,  the  manifold  representation,  given  by 
the  potential  function  p{6,  3)  in  Eq.  (6.60),  forms  a  curved  surface  in  terms  of 
local  parameter  [x,  y]T .  Their  values  -  under  the  re-parameterized  parameters 
( 6 .  3),  indeed,  signify  possible  relative  error  (or  uncertainty)  of  the  underlying 
sensor  network  measurement,  that  is,  the  ratio  of  measurement  variance  to  the 
measurement  value. 

2.  The  lower  the  surface  if  (6.  H)  is,  the  smaller  the  possible  relative  error  of  a 
measurement  is.  In  particular,  as  shown  in  Fig.  6.14  and  Fig.  6.15,  since  the 
measurement  model  involves  bearings,  the  value  of  ip  will  be  higher  when  the 
bearing  measurement  value  is  smaller. 

3.  We  observed  that  the  manifold  representations  of  all  three  examples  have  neg¬ 
ative  scalar  Ricci  curvatures  everywhere.  Generally  speaking,  the  manifold  of 
a  Gaussian  distribution  in  natural  parameters  results  in  a  negative  constant  cur¬ 
vature,  while  the  manifold  of  a  curved  Gaussian,  as  in  the  examples  presented 
in  this  report,  is  of  negative  curvature  everywhere.  The  curvatures  of  statistical 
manifolds  are  closely  related  to  the  state  estimation  problem  [84]  and  we  will 
address  this  issue  for  sensor  networks  in  separate  research  work. 
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Figure  6. 16:  Affine  immersion  for  the  manifold  of  three  ranges-only  sensor  network  in 
Example  3. 
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Chapter  7 


Concluding  Remarks 


We  have  studied  the  detection  performance  for  trees  with  unbounded  height.  For  bal¬ 
anced  binary  relay  trees,  the  total  error  probability  decays  to  0  in  the  rate  of  \/N,  even 
if  the  sensors  are  asymptotically  crummy.  In  addition,  the  scaling  law  for  the  decay  rate 
remains  in  the  case  where  sensors  fail  with  certain  probabilities.  In  the  case  where  the 
communication  links  fail  with  certain  probabilities,  if  all  the  links  fail  with  identical 
probability,  then  the  decay  rate  is  strictly  smaller  than  the  non-failure  case.  If  the  link 
failure  probabilites  decay  to  0  sufficiently  fast  towards  the  fusion  center,  then  the  scal¬ 
ing  law  for  the  decay  rate  remains.  We  further  investigate  the  overall  strategy  which 
achieves  the  maximum  of  the  reduction  in  the  total  error  probability.  We  provide  the 
explicit  solution  using  dynanmic  programming  methold  and  Bellman’s  princeple.  We 
show  that  the  reduction  in  the  total  error  probability  is  a  submodular  function.  There¬ 
fore,  the  greedy  strategy  achieves  a  total  error  probability  that  is  at  least  a  factor  of 
the  error  probabilty  achieved  by  the  overall  optimal  strategy.  We  further  study  the  de¬ 
tection  performance  of  M-ary  relay  trees,  which  is  a  more  general  architecture.  The 
impact  of  non-binary  message  alphabet  is  also  investigated. 

In  this  report,  the  use  of  information  geometry  theory  in  the  performance  evaluation 
of  sensor  networks  is  considered  and  the  potential  application  of  information  geome¬ 
try  in  sensor  network  analysis  and  design  is  demonstrated  using  three  basic  types  of 
sensor  network  scenarios.  In  addition,  an  Euclidean  immersion  method  for  the  repre¬ 
sentation  of  a  statistical  manifold  is  presented.  The  analysis  results  obtained  suggest 
that  geometrical  constructs  of  statistical  manifolds  such  as  geodesics,  Fisher  informa¬ 
tion  distance  and  curvature  can  be  useful  for  the  evaluation  of  the  sensor  measurement 
process  and  may  facilitate  sensor  network  design,  evaluation  and  optimisation.  The 
results  presented  in  this  report  also  highlight  that  information  geometry  offers  a  con¬ 
sistent  and  comprehensive  means  for  understanding  and  solving  sensor  network  issues 
in  target  tracking. 

One  limitation  of  our  results  is  that  it  applys  to  particular  architecture.  Future 
research  includes  analyzing  more  general  architectures.  For  example,  un-balanced  hi- 
erachical  architecture,  trees  with  non-uniform  degree,  etc.  For  the  communication  link 
failure  case,  our  model  is  essentially  a  deletion  channel  model  for  balanced  binary  re¬ 
lay  trees.  We  can  further  consider  symmetric  channel,  noisey  channel,  even  a  stucked 
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on/off  channel  for  balanced  binary  relay  trees  and  more  general  architectures.  Another 
challenging  question  is  the  he  correlated  sensor  measurement  case. 
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Appendix  A 


Proof  of  Proposition  2.3.1 


If  (ak,  (3k)  G  Bm,  where  m  is  a  positive  integer  and  m  ^  1,  then 

Lk+ 1  _  1  —  (1  —  ak)2  +  Pk 
L2  (ak  +  Pk)2 


The  following  calculation  establishes  the  lower  bound  of  the  ratio  Lk+\/L\: 

Lk+ 1  —  ife  =  1  (1  ak)2  +  Pk  —  (ctfc  +  pk)2 

—  2q^  2akPk  T  2cxk 

=  2ak(l  -  (ak  +  /3k))  >  0, 


which  holds  in  Bm. 

To  show  the  upper  bound  of  the  ratio  Lk+i/Ll,  it  suffices  to  prove  that 

Bk+i  —  2  L\  —  1  —  (1  —  ak)2  +  Pk  —  2(ak  +  fik)2 
=  —3 otk  —  4 ak/3k  +  2 ak  —  pk  <  0. 

The  partial  derivative  with  respect  to  (3k  is 


d(Lk+l  -  2 Lp 

dPk 


=  —2Pk  —  4 ak  <  0, 


which  is  non-positive,  and  so  it  suffices  to  consider  values  on  the  upper  boundary  of 
Bi. 


Lk+ i  —  2  L~j,  —  1  —  (1  —  ak )2  +  (3k  —  2(ak  +  (3k)2 
=  2 Pl  -  2(ak  +  (3k)2  <  0. 


In  consequence,  the  claimed  upper  bound  on  the  ratio  Lk+\/L2  holds. 
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Appendix  B 


Proof  of  Proposition  2.3.2 

From  Proposition  2.3.1  we  have,  for  k  =  0, 1, . . . ,  m  —  2, 

Lk+ i  = 

for  some  a k  G  [1, 2].  Then  for  k  =  1,  2, . . . ,  m  —  1, 

_  2  2fc_1  r  2fc 

1  *  ^k— 2  '  *  '  a0  ^0  ’ 

where  a,  €  [1,  2]  for  each  i.  Hence, 

log  L”1  =  -  log  cifc_i  -  2  log  ak-2  -  ■  •  • 

-  2fc~1  loga0  -  logijf. 

Since  log  Lg  1  >  0  and  0  <  log  a,;  <  1  for  each  i,  we  have 

logL"1  <  2fclogLg  ^ 

Finally, 

log  LI 1  >  -1  -  2  -  . . .  -  2k~x  +  2k  log  Lg  1 

>  -2k  +  2fc  logL^1  =  2k  (logLg  1  -  1)  . 
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Appendix  C 


Proof  of  Proposition  2.3.3 


If  (afc,  (3k)  G  Bm,  where  m  is  a  positive  integer  and  m  ^  1,  then 

Lk+ 1  _  1  —  (1  —  ctfc)2  +  p3\ 

L'f1  (afc+/3fc)'/2 

To  prove  the  upper  bound  of  the  ratio,  it  suffices  to  show  that 

i>{otk,Pk)  =  1  -  (1  -  «fe)2  +/3fc  -  («fc  +/3fe)'/5  <  0. 


The  second-order  partial  derivative  of  ip  with  respect  to  au  is  non-positive: 


d2ip 

dal 


-2  -  V2(V2  -  l)(ofc  +  <  0. 


Therefore,  the  minimum  of  dip/dak  is  on  the  lines  otk+fik  =  1  and  (1 — afc)2+/3|  =  1. 
It  is  easy  to  show  that  dip /dak  >  0.  In  consequence,  the  maximum  of  ip  is  on  the  lines 
otk  +  (3k  =  1  and  (1  —  a*;)2  +  /32  =  1.  If  a*  +  fa  =  1-  then  it  is  easy  to  see  that 
ip  =  0.  If  (1  —  ctfc)2  +/32  =  1,  then  ip  =  2(3%  —  (a*.  +  /3fc)v/2.  It  is  easy  to  show  that  the 
maximum  value  of  ^  lies  at  the  intersection  of  ak  +  (3k  =  1  and  (1  —  o^)2  +  0%.  =  1, 
where  ip  =  0.  Hence,  the  ratio  Lk+i/L'jf2  is  upper  bounded  by  1. 
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Appendix  D 


Proof  of  Proposition  2.3.4 


From  Proposition  2.3.3  we  have,  for  k  =  0, 1, . . . ,  m  —  2, 

Lk+ 1  = 

for  some  G  (0, 1].  Then  for  k  =  1, 2, . . . ,  m  —  1, 

r  _  „  V2k-1  t  y/2k 

—  Mk—l  '  ^k— 2  *  '  '  ^0  -^0  5 

where  a*  G  (0, 1]  for  each  i.  Hence, 

log^1  =  -  log  dfc-i  -  V^log  a,k-2  -  •  ■ 
-  a/2  log  a0  -  log  Lf* 


o 


Since  log  L0  1  >  0  and  log  a,  <  0  for  each  i,  we  have 


Therefore,  we  have 


log  Lf. 1  >  a/2  log  Tq  1. 

logH^1  >  v^logLp1. 
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Appendix  E 


Proof  of  Proposition  2.3.5 


Because  of  symmetry,  we  only  have  to  prove  the  case  where  (ak,  Pk)  lies  in  Rjj.  We 
consider  two  cases:  (ak,Pk)  G  Bi  and  (ak,  0k)  G  B2C\  Ru¬ 
in  the  first  case, 

Lk+2  _(!-(!-  ctfc)2)2  +  1  -  (1  -  0l)2 
L\  (oik  +  Pk)2 


To  prove  the  lower  bound  of  the  ratio,  it  suffices  to  show  that 

Lk+2  -  Ll  =  (1  -  (1  -  oik)2)2  +  1  -  (1  -  Pi)2  -  K  +  Pk)2 
=  (1  -  ak  -  Pk)((Pk  ~  ctfc)3  +  2 akPk(Pk  -  ctk) 

+  (Pk  —  Oik)"  +  2 al)  >  0. 

We  have  1  —  ak  —  Pk  >  0  and  0k  >  ak  for  all  (a*,  Pk)  G  B\,  resulting  in  the  above 
inequality. 

To  prove  the  upper  bound  of  the  ratio,  it  suffices  to  show  that 

Lk+2  —  2 L\  =  oik  ~  +  2 a2  —  ActkPk  —  Pt  —  0- 


The  partial  derivative  with  respect  to  pk  is 

d(Lk+ 2  —  2  Ll) 


dPk 


=  -4afc  -  4 Pi  <  0, 


which  is  non-positive.  Therefore,  it  suffices  to  consider  its  values  on  the  curve  3k  = 
ak,  on  which  Lk+ 2  —  2 is  clearly  non-positive. 

Now  we  consider  the  second  case,  namely  (ak,  0k)  G  B-2  (T  Ru,  which  gives 

Lk+2  1  —  (1  —  Ctfc)4  +  Pl 


Ll 


(ak  +  Pk)2 
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To  prove  the  lower  bound  of  the  ratio,  it  suffices  to  show  that 


Lk+ 2  —  L\  —  (1  —  (1  —  ttfe)4)  +  Pk  —  {otk  +  PkY 

=  (1  -  ak  -  Pk){a3k  -  a2kpk  -  3a2k  +  akpl 

+  2 akPk  ~  Pk  ~  Pk  +  4 ak)  >  0. 

Therefore,  it  suffices  to  show  that 

<K«fc,  Pk)  =a\  -  a2kpk  -  3 a2k  +  akPl 

+  2 otkPk  —  Pk  ~  Pk  +  4 ak  >  0. 

The  partial  derivative  with  respect  to  pk  is 

-wir  =  —{®k  —  Pk)2  ~  2pk  +  2 (ak  —  Pk)  <  0. 

Opk 

Thus,  it  is  enough  to  consider  the  values  on  the  upper  boundaries  \/l  —  pk  +  y/ctk  =  1 
and  otk+  Pk  =  1- 

If  o 'k  +  Pk  =  1,  then  the  inequality  is  trivial,  and  if  yj  1  —  pk  +  y/ok  =  1.  then 
Lk+ 2  ~  L2  =  2a2  (1  —  2yfak)(2atk  —  6 y/aj)  +  5) 
and  the  inequality  holds  because  ak  <  \  in  region  B2  (T  Ru- 

The  claimed  upper  bound  for  the  ratio  Lk+ 2  / L2  can  be  written  as 

Lk+ 2  —  2  L2  =  (1  —  (1  —  ctfc)4)  +  Pk  —  2( ctfe  +  /3fc)2 
=  — +  4 -  80^  +  4afe 

—  4afc/3fc  +  Pt  ~  2Pk  <  0. 

The  partial  derivative  with  respect  to  pk  is 

^(Lfc+^  =  ~4ak  +  ^  ~  4/3fc  -  °' 

Again,  it  is  sufficient  to  consider  values  on  the  upper  boundary  of  B\.  Hence, 

Lk+2  -  2 L2  =  2 p2  -  2{ak  +  Pk)2  <  0. 
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Appendix  F 


Proof  of  Proposition  2.3.6 


In  the  case  where  ( ak,  Pk )  G  B2  D  Ru ,  from  Proposition  2.3.3,  we  have  Lk+ 1  <  L'f2. 
Moreover,  it  is  easy  to  show  that  Lk+ 2  <  Lk+ 1.  Thus,  we  have  L/.+ 2  <  L'jf2 . 

In  the  case  where  (ak,  Pk)  G  B 1,  it  suffices  to  prove  that 

■d(ak,  P k)  =  (1  —  (1  —  oik)2)2  +  1  -  (1  -  Pi)2  -  (ak  +  Pk )'/5  <  0. 


We  take  second-order  partial  derivative  of  i9  with  respect  to  ak  along  the  lines 
Oik  +  Pk  =  c  in  this  region.  It  is  easy  to  show  that  the  derivative  is  non-negative: 


d2V 

dal 


12((1  -  ak)2  -  Pi)  >  0. 


Therefore,  we  conclude  that  the  maximum  of  t)  lies  on  the  boundaries  of  this  region. 
If  ak  +  Pk  —  1>  then  we  have  1 9(ak,  Pk)  =  0-  If  (1  —  ak)2  +  Pi  =  1,  then  we  have 
ak+ 1  =  Pk+i-  Moreover,  if  ak+ 1  =  Pk+i,  then  we  can  show  that  Lk+ 2  =  Tfe+i- 
Hence,  it  suffices  to  show  that  Lk+i/L'f2  <  1  on  the  line  (1  —  ak)2  +  Pi  =  1,  which 
has  been  proved  in  Proposition  2.3.3.  If  pk  =  ak~  then  Lk+ 1  =  Lk  and  (ak+i,  Pk+i) 
lies  on  the  lower  boundary  of  Rc,  on  which  we  have  Lk+2/ <  1.  Thus,  we  have 
Lk+2/Lf  <  1. 


106 


Appendix  G 


Proof  of  Proposition  2.3.7 


From  Proposition  2.3.6  we  have,  for  k  =  0,2,...,  log  TV  —  2, 

Lk+ 2  = 

for  some  G  (0, 1].  Then  for  k  =  2,4,...,  log  TV,  we  have 

T  _  V2(fc_2)/2  r^72 

—  <l(k-2)/2  '  a(fc_4)/2  '  •  ’  °0  ^0  > 

where  a*  G  (0, 1]  for  each  i.  Therefore, 

log^fc  1  =  -  loga(fc_2)/2  -  V/21oga(fe_4)/2  -  . . . 
-  V  2  log  a0  -  log  Lj 

Since  log  L^1  >  0  and  log  a,;  <  0  for  each  i,  we  have 

log  L^1  >  V2k/2\og  Lq1. 

Therefore,  we  have 

logP^1  >  ^VlogLT1. 
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Appendix  H 


Proof  of  Proposition  2.3.8 


The  first  inequality  is  equivalent  to 

Lk+ 1  —  ife  =  1  —  (1  —  ak)2  +  (3k  —  [ak  +  (3k)2 
=  2afc(l  -  (ak  +  /3k))  >  0, 

which  holds  for  all  (ak,  /3k)  GU. 

The  second  inequality  is  equivalent  to 

Lk+i  —  Lk  —  1  —  (1  —  ak)2  +  0j/.  —  (ak  +  Pk) 
=  ( ak  -  /3k)(  1  -  {ak  +  /3k))  <  0, 

which  holds  for  all  (ak,  f3k)  &  U. 


108 


Appendix  I 


Proof  of  Theorem  2.3.2 


From  Proposition  2.3.8,  we  have 

L\  —  oLq 

for  some  a  >  1.  And,  by  Proposition  2.3.5,  the  following  identity  holds. 

Lk+ 2  =  akL\ 

for  fc  =  1,3,...,  log  TV  —  2  and  some  a k  £  [1,2].  Hence,  we  can  write 

_  ~2(fc-1)/2  2  2(fc-3)/2  r2('I+1)/2 
Lk  =  a  '  a(fc-l)/2  ’  «(fc_3)/2  •  •  •  al  Lq  > 

where  a,  G  [1,  2]  for  each  T  and  a  >  1.  Let  k  =  log  TV,  we  have 
logP^1  =  -2(fc'1)/2  logo  -  log a(fc_1)/2 
-  2(fc"3)/2  logoi  +  V^TVlog Lq1. 

Notice  that  log  Lq  1  >  0  and  for  each  i,  log  a,  >  0.  Moreover,  log  a  >  0.  Hence, 

logP^1  <  v^TVlogPg1. 


It  follows  by  Proposition  2.3.8  that 

Lk  =  aLk- 1 

for  some  a  £  (0, 1].  By  Proposition  2.3.5,  we  have 

Lk+ 2  = 

for  k  =  0,  2, ... ,  log  TV  —  3  and  some  a*,  £  [1,2].  Thus, 

T  2  2(fc_3)/2  r2(fc_1)/2 
Lk  —  a  ■  CL(k- 3)/2  •  «(fc_3)/2  •  •  •  a0  Po  > 

where  a,  G  [1.  2]  for  each  *  and  a  G  (0, 1].  Hence, 

logP^1  =  —  log  a  —  log  a(fc_  !)/2 

-  2(fe^3)/2  logo!  +  yflogLg1. 
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Notice  that  log  L0  1  >  0  and  for  each  i,  0  <  log  Oj  <  1  and  log  a  <  0.  Thus, 


logP^1 
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Appendix  J 


Proof  of  Proposition  2.3.9 


The  upper  bound  for  Lk+i/Lk  is  trivial.  By  Proposition  2.2.1,  if  (aik-t,  fik-i)  G 
B-2  PI  Ru ,  then 

1  <  ^ k  <  2- 
1  -  7-2  -  Z’ 

Lk-1 


i.e., 


1  Lt-i  , 
-  <  <  1, 

2  "  Lk  ~ 


and  in  consequence  of  Proposition  2.3.5,  if  (ak-i,  (3k-i)  G  f?2  IT  Ru,  then 


1  <  <  2. 

Lt-i 


Therefore,  we  have 


1 

-  < 
2  “ 


Lk+i 

Lk 
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Appendix  K 


Proof  of  Theorem  2.3.3 


If  (am,  f3m)  £  f?2  f~l  Ru  and  m  is  even,  then  by  Proposition  2.3.1,  we  have 


L'rri+l 

for  some  a  £  [1,  2], 

By  Proposition  2.3.5,  we  have 

Lk+ 2  =  OfcLfc 

for  k  =  0, 2, . . . ,  to  —  2,  m  +  1, . . . ,  log  N  —  2,  and  some  ak  £  [1,2].  Hence, 

_  2  2(fc-l)/2  2(fc  +  l)/2 

—  a(fc-l)/2  '  a(k-3)/2  ■  ■  ■  ao  La  > 

where  a*  €  [1,  2]  for  each  i. 

Let  k  =  log  N,  we  have 

log  Pn1  =  ~  loS  a(k- 1)/2  ^  2  log  a(fc_3)/ 2  -  •  •  • 

-  2<fe-1)/2  loga0  +  V2Nlog  L~l. 


Notice  that  log  L0  x>  0  and  for  each  i,  0  <  log  ai  <  1.  Thus, 
logP^1  <  v/2]VlogLo !. 

Finally, 

logP^1  >  -V2N  +  V^NlogLy1  =  V2N  (logL^1  -  l)  . 


If  (am,  (3m)  £  B-2  PI  Ru  and  m  is  odd,  then  by  Proposition  2.3.9  we  have 

fm+2  =  ttZ/m+1 

for  some  a  £  [1/2, 1], 
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It  follows  from  Proposition  2.3.5  that 

Lk+ 2  =  QfcPfc 

for  k  =  0, 2, . . . ,  to  —  1,  m  +  2, . . . ,  log  N  —  2  and  some  a*,  €  [1,  2].  Therefore, 

_  2  2(fc  —  3)/2  ~2(fc  — m  — 2)/2  2(fc-l)/2 

Pfc  —  a(k-3)/2  '  a(fc-3)/2  •  •  '  a0  'a  Y  J 

where  a*  G  [1,  2]  for  each  *  and  a  £  [1/2,1].  Hence, 


AT 

log  Pa,1  =  -y  ^+2  log  a  -  log  a(fe_3)/2  -  . . . 

V^,  /iV,  r_! 

— - — logao  +  J  —  \ogL0  . 


Notice  that  log  L0  >  0  and  for  each  z,  0  <  log  a*  <  1  and  —  1  <  log  a  <  0.  Thus, 


log  <\/irlogPo 


-l 


N 

2  'm+2  ' 


Finally, 


logP^1  >  -y  ■ y  +  y  : y  logLo1  =  Y  Y  (iog-^o1  -  !)  • 
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Appendix  L 


Proof  of  Theorem  2.3.4 


If  log  TV  <  to,  then  this  scenario  is  the  same  as  that  of  Corollary  2.3.1.  Therefore, 

N  (log^o1  -  !)  <  logP^1  <  IV logic1. 

If  log  TV  >  to  and  log  TV  —  to  is  odd,  then  it  takes  (to  —  1)  steps  for  the  system 
to  move  into  B\ .  After  it  arrives  in  B  \ ,  there  is  an  even  number  of  levels  left  because 
log  TV  —  to  is  odd. 

By  Proposition  2.3.1,  we  have 

ifc+l  =  Ofcifc 

for  k  =  0, 1, . . . ,  to  —  2  and  some  £  [1,2],  and  in  consequence  of  Proposition  2.3.5, 

ifc  +  2  =  Ofcifc 

for  k  =  m  —  1,  to  —  3, ... ,  log  TV  —  2  and  some  a*,  £  [1,2].  Thus, 

T  _  2  2(*+m  — 3)/2  2(fc+m  — 1}/2 

ifc  —  0(fc+m-3)/2  ’  a(fe+m-5)/2  •  •  ■  a0  io  ) 

where  a,  €  [  1,  2]  for  each  i. 

Let  fc  =  log  TV.  Then  we  obtain 

log  Pn1  =  ~  log  a(fc+m-3)/2  ^  21oga(fc+rn_5)/2  -  . . . 

, /2m- 1  j\r  _ 

- 2 - loga0  +  V2"*-1  TV  login1. 

Note  that  log  0,  and  for  each  i,  0  <  log  a,;  <  1.  Thus, 

logP^1  <  y/2m~1  TV  logic1. 

Finally, 

logP^1  >  -V2m~1N  +  V2m~1N  log  io  1 

=  v/2m_1TV  (logic  1  “  !)  • 

For  the  case  where  log  TV  —  to  is  even,  the  proof  is  similar  and  it  is  omitted. 
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Appendix  M 


Proof  of  Proposition  2.3.10 


Without  loss  of  generality,  we  consider  the  upper  half  of  S,  denoted  by  Su-  As  we 
shall  see,  the  image  of  Su  is  exactly  the  reflection  of  Su  with  respect  to  the  line  (3  =  a 
(denoted  by  Sc)-  We  know  that  Su  '■=  {(a,  /3)  G  U\(3  <  -Ja  and  /3  >  1  —  (1  —  a)2}. 

The  image  of  Su  under  /  can  be  calculated  by 

(a',f3')  =  f(a,f3)  =  (l-(l-a)2,l32), 

where  (a,  (3)  GU.  The  above  relation  is  equivalent  to 

(a, /3)  =  (1  -  Vl  -  a',  y//?7). 

Therefore,  we  can  calculate  images  of  boundaries  for  Ru  under  /. 

The  image  of  the  upper  boundary  3  <  ^/o:  is 


—  or 


Ol  >  1  -  (1  -  f3')2, 

and  that  of  the  lower  boundary  j3  >  1  —  (1  —  a) 2  is 

VW  >  1  —  (1  —  (1  —  VT^a7))2; 

i.e., 

ol  <  \/W- 

The  function  /  is  monotone.  Hence,  images  of  boundaries  of  Su  are  boundaries  of 
Sc-  Notice  that  boundaries  of  Re  are  symmetric  with  those  of  Ru  about  (3  =  a.  We 
conclude  that  S  is  an  invariant  region. 


115 


Bibliography 


[1]  R.  R.  Tenney  and  N.  R.  Sandell,  “Detection  with  distributed  sensors,”  IEEE  Trans. 
Aerosp.  Electron.  Syst.,  vol.  AES-17,  no.  4,  pp.  501-510,  Jul.  1981. 

[2]  Z.  Chair  and  R  K.  Varshney,  “Optimal  data  fusion  in  multiple  sensor  detection 
systems,”  IEEE  Trans.  Aerosp.  Electron.  Syst.,  vol.  AES-22,  no.  1,  pp.  98-101, 
Jan.  1986. 

[3]  J.-F.  Chamberland  and  V.  V.  Veeravalli,  “Asymptotic  results  for  decentralized  de¬ 
tection  in  power  constrained  wireless  sensor  networks,”  IEEE  J.  Sel.  Areas  Com- 
mun.,  vol.  22,  no.  6,  pp.  1007-1015,  Aug.  2004. 

[4]  J.  N.  Tsitsiklis,  “Decentralized  detection,”  Advances  in  Statistical  Signal  Process¬ 
ing,  vol.  2,  pp.  297-344,  1993. 

[5]  G.  Polychronopoulos  and  J.  N.  Tsitsiklis,  “Explicit  solutions  for  some  simple 
decentralized  detection  problems,”  IEEE  Trans.  Aerosp.  Electron.  Syst.,  vol.  26, 
no.  2,  pp.  282-292,  Mar.  1990. 

[6]  W.  P.  Tay,  J.  N.  Tsitsiklis,  and  M.  Z.  Win,  “Asymptotic  performance  of  a  censoring 
sensor  network,”  IEEE  Trans.  Inform.  Theory,  vol.  53,  no.  11,  pp.  4191  -4209, 
Nov.  2007. 

[7]  P.  Willett  and  D.  Warren,  “The  suboptimality  of  randomized  tests  in  distributed 
and  quantized  detection  systems,”  IEEE  Trans.  Inform.  Theory,  vol.  38,  no.  2,  pp. 
355-361,  Mar.  1992. 

[8]  R.  Viswanathan  and  P.  K.  Varshney,  “Distributed  detection  with  multiple  sensors: 
Part  I — Fundamentals,”  Proc.  IEEE,  vol.  85,  no.  1,  pp.  54—63,  Jan.  1997. 

[9]  R.  S.  Blum,  S.  A.  Kassam,  and  H.  V.  Poor,  “Distributed  detection  with  multiple 
sensors:  Part  II — Advanced  topics,”  Proc.  IEEE,  vol.  85,  no.  1,  pp.  64-79,  Jan. 
1997. 

[10]  T.  M.  Duman  and  M.  Salehi,  “Decentralized  detection  over  multiple-access  chan¬ 
nels,”  IEEE  Trans.  Aerosp.  Electron.  Syst.,  vol.  34,  no.  2,  pp.  469-476,  Apr.  1998. 

[11]  B.  Chen  and  P  K.  Willett,  “On  the  optimality  of  the  likelihood-ratio  test  for  local 
sensor  decision  rules  in  the  presence  of  nonideal  channels,”  IEEE  Trans.  Inform. 
Theory,  vol.  51,  no.  2,  pp.  693-699,  Feb.  2005. 


116 


[12]  B.  Liu  and  B.  Chen,  “Channel -optimized  quantizers  for  decentralized  detection 
in  sensor  networks,”  IEEE  Trans.  Inform.  Theory ,  vol.  52,  no.  7,  pp.  3349-3358, 
Jul.  2006. 

[13]  B.  Chen  and  P.  K.  Varshney,  “A  Bayesian  sampling  approach  to  decision  fusion 
using  hierarchical  models,”  IEEE  Trans.  Signal  Process .,  vol.  50,  no.  8,  pp.  1809- 
1818,  Aug.  2002. 

[14]  A.  Kashyap,  “Comments  on  on  the  optimality  of  the  likelihood-ratio  test  for  local 
sensor  decision  rules  in  the  presence  of  nonideal  channels,”  IEEE  Trans.  Inform. 
Theory ,  vol.  52,  no.  3,  pp.  1274—1275,  Mar.  2006. 

[15]  H.  Chen,  B.  Chen,  and  P.  K.  Varshney,  “Further  Results  on  the  Optimality  of  the 
Likelihood-Ratio  Test  for  Local  Sensor  Decision  Rules  in  the  Presence  of  Nonideal 
Channels,”  IEEE  Trans.  Inform.  Theory ,  vol.  55,  no.  2,  pp.  828-832,  Feb.  2009. 

[16]  G.  Fellouris  and  G.  V.  Moustakides,  “Decentralized  Sequential  Hypothesis  Test¬ 
ing  Using  Asynchronous  Communication,”  IEEE  Trans.  Inform.  Theory ,  vol.  57, 
no.  1,  PP-  534-548,  Jan.  2011. 

[17]  J.  A.  Gubner,  L.  L.  Scharf,  and  E.  K.  P.  Chong,  “Exponential  error  bounds  for 
binary  detection  using  arbitrary  binary  sensors  and  an  all-purpose  fusion  rule  in 
wireless  sensor  networks,”  in  Proc.  IEEE  International  Conf.  on  Acoustics,  Speech, 
and  Signal  Process.,  Taipei,  Taiwan,  Apr.  19-24  2009,  pp.  2781-2784. 

[18]  Z.  B.  Tang,  K.  R.  Pattipati,  and  D.  L.  Kleinman,  “Optimization  of  detection 
networks:  Part  I — Tandem  structures,”  IEEE  Trans.  Syst.,  Man  and  Cybern.,  vol. 
21,  no.  5,  pp.  1044-1059,  Sept./Oct.  1991. 

[19]  R.  Viswanathan,  S.  C.  A.  Thomopoulos,  and  R.  Tumuluri,  “Optimal  serial  dis¬ 
tributed  decision  fusion,”  IEEE  Trans.  Aerosp.  Electron.  Syst.,  vol.  24,  no.  4,  pp. 
366-376,  Jul.  1988. 

[20]  W.  P.  Tay,  J.  N.  Tsitsiklis,  and  M.  Z.  Win,  “On  the  sub-exponential  decay  of 
detecion  error  probabilities  in  long  tandems,”  IEEE  Trans.  Inform.  Theory,  vol. 
54,  no.  10,  pp.  4767^1771,  Oct.  2008. 

[21]  J.  D.  Papastravrou  and  M.  Athans,  “Distributed  detection  by  a  large  team  of 
sensors  in  tandem,”  IEEE  Trans.  Aerosp.  Electron.  Syst.,  vol.  28,  no.  3,  pp.  639- 
653,  Jul.  1992. 

[22]  V.  V.  Veeravalli,  “Topics  in  Decentralized  Detection,”  Ph.D.  thesis.  University  of 
Illinois  at  Urbana-Champaign,  1992. 

[23]  Z.  B.  Tang,  K.  R.  Pattipati,  and  D.  L.  Kleinman,  “Optimization  of  detection 
networks:  Part  II — Tree  structures,”  IEEE  Trans.  Syst.,  Man  and  Cybern.,  vol.  23, 
no.  1,  pp.  211-221,  Jan./Feb.  1993. 

[24]  W.  P.  Tay,  J.  N.  Tsitsiklis,  and  M.  Z.  Win,  “Data  fusion  trees  for  detecion:  Does 
architecture  matter?,”  IEEE  Trans.  Inform.  Theory,  vol.  54,  no.  9,  pp.  4155-4168, 
Sept.  2008. 

[25]  A.  R.  Reibman  and  L.  W.  Nolte,  “Design  and  performance  comparison  of  dis¬ 
tributed  detection  networks,”  IEEE  Trans.  Aerosp.  Electron.  Syst.,  vol.  AES-23, 
no.  6,  pp.  789-797,  Nov.  1987. 


117 


[26]  W.  P.  Tay  and  J.  N.  Tsitsiklis,  “Error  Exponents  for  Decentralized  Detection  in 
Tree  Networks,”  in  Networked  Sensing  Information  and  Control  V.  Saligrama, 
Ed.,  Springer- Verlag,  New  York,  NY,  2008,  pp  73-92. 

[27]  W.  P.  Tay,  J.  N.  Tsitsiklis,  and  M.  Z.  Win,  “Bayesian  detection  in  bounded  height 
tree  networks,”  IEEE  Trans.  Signal  Process .,  vol.  57,  no.  10,  pp.  4042-4051,  Oct. 
2009. 

[28]  A.  Pete,  K.  R.  Pattipati,  and  D.  L.  Kleinman,  “Optimization  of  detection  networks 
with  multiple  event  structures,”  IEEE  Trans.  Autom.  Control  vol.  39,  no.  8,  pp. 
1702-1707,  Aug.  1994. 

[29]  O.  P.  Kreidl  and  A.  S.  Willsky,  “An  efficient  message-passing  algorithm  for  op¬ 
timizing  decentralized  detection  networks,”  IEEE  Trans.  Autom.  Control,  vol.  55, 
no.  3,  pp.  563-578,  Mar.  2010. 

[30]  S.  Alhakeem  and  P.  K.  Varshney,  “A  unified  approach  to  the  design  of  decentral¬ 
ized  detection  systems,”  IEEE  Trans.  Aerosp.  Electron.  Syst.,  vol.  31,  no.  1,  pp. 
9-20,  Jan.  1995. 

[31]  Y.  Lin,  B.  Chen,  and  P.  K.  Varshney,  “Decision  fusion  rules  in  multi-hop  wireless 
sensor  networks,”  IEEE  Trans.  Aerosp.  Electron.  Syst.,  vol.  41,  no.  2,  pp.  475-488, 
Apr.  2005. 

[32]  J.  A.  Gubner,  E.  K.  P.  Chong,  and  L.  L.  Scharf,  “Aggregation  and  compression 
of  distributed  binary  decisions  in  a  wireless  sensor  network,”  in  Proc.  Joint  48th 
IEEE  Conf.  on  Decision  and  Control  and  28th  Chinese  Control  Confi,  Shanghai, 
P.  R.  China,  Dec.  16-18  2009,  pp.  909-913. 

[33]  Z.  Zhang,  A.  Pezeshki,  W.  Moran,  S.  D.  Howard,  and  E.  K.  P.  Chong,  “Error 
probability  bounds  for  balanced  binary  fusion  trees,”  IEEE  Trans.  Inform.  Theory, 
to  appear.  Available  from  [arXiv:l  105.1 187vl]. 

[34]  Z.  Zhang,  A.  Pezeshki,  W.  Moran,  S.  D.  Howard,  and  E.  K.  P.  Chong,  “Error 
probability  bounds  for  binary  relay  trees  with  crummy  sensors,”  in  Proc.  the  201 1 
Workshop  on  Defense  Applications  of  Signal  Processing  (DASP’ll),  The  Hyatt 
Coolum  Resort,  Coolum,  Queensland,  Australia,  Jul.  10-14,  2011  (Invited  Paper). 

[35]  Z.  Zhang,  A.  Pezeshki,  W.  Moran,  S.  D.  Howard,  and  E.  K.  P.  Chong,  “Error 
probability  bounds  for  binary  relay  trees  with  unreliable  communication  links,”  in 
Proc.  of  the  Asilomar  Conference  on  Signals,  Systems,  and  Computers ,  Asilomar 
Hotel  and  Conference  Grounds,  Pacific  Grove,  California,  November  6-9,  2011,  to 
appear. 

[36]  Y.  Kanoria  and  A.  Montanari,  “Subexponential  convergence  for  information  ag¬ 
gregation  on  regular  trees,”  in  Proc.  Joint  50th  IEEE  Conf.  on  Decision  and  Control 
and  European  Control  Conf,  Orlando,  Florida,  Dec.  12-15  2011,  to  appear. 

[37]  Z.  Zhang,  E.  K.  P.  Chong,  A.  Pezeshki,  W.  Moran,  and  S.  D.  Howard,  “Error 
probability  bounds  for  M- ary  relay  trees,”  in  Proc.  of  International  Symposium  of 
Information  Theory,  2012,  submitted. 

[38]  Z.  Zhang,  E.  K.  P.  Chong,  A.  Pezeshki,  W.  Moran,  and  S.  D.  Howard,  “Detection 
Performance  of  A/ -ary  Relay  Trees  with  Non-binary  Message  Alphabets,”  in  Proc. 
of  Statistical  Signal  Process.  Workshop,  2012,  submitted. 


118 


[39]  Z.  Zhang,  E.  K.  P.  Chong,  A.  Pezeshki,  W.  Moran,  and  S.  D.  Howard,  “Submod¬ 
ularity  and  Optimality  of  Fusion  Rules  in  Balanced  Binary  Relay  Trees,”  in  Proc. 
of  51th  IEEE  Conf  on  Decision  and  Control,  2012,  submitted. 

[40]  P.  K.  Varshney,  Distributed  Detection  and  Data  Fusion,  Springer- Verlag,  New 
York,  NY,  1997. 

[41]  H.  L.  Van  Trees,  Detection,  Estimation,  and  Modulation  Theory,  Part  I,  John 
Wiley  and  Sons,  New  York,  NY,  1968. 

[42]  G.  Nemhauser,  L.  Wolsey,  and  M.  Fisher,  “An  analysis  of  the  approximations  for 
maximizing  submodular  set  functions,”  in  Math.  Programming,  vol.  14,  no.  1,  pp. 
265-294,  1978. 

[43]  F.  Gianfelici,  V.  Battistelli,  Methods  of  informaiton  geometry  (S.  Amari,  H.  Na- 
gaoka,  American  Mathematical  Society  and  Oxford  University  Press,  2000)  (book 
review),  IEEE  Transactions  on  Information  Theory  55  (1)  (2009)  2905-2906. 

[44]  S.  Amari,  Information  geometry  of  statistical  inference  -  an  overview,  in:  IEEE 
Information  Theory  Workshop,  Bangalore,  India,  2002,  pp.  86-89. 

[45]  S.  Amari,  H.  Nagaoka,  Methods  of  Information  Geometry,  American  Mathemat¬ 
ical  Society  and  Oxford  University  Press,  New  York,  2000. 

[46]  C.  R.  Rao,  Information  and  accuracy  attainable  in  the  estimation  of  statistical 
parameters.  Bulletin  of  the  Calcutta  Mathematical  Societ  37  (1945)  81-91. 

[47]  N.  N.  Chentsov,  Statistical  Decision  Rules  and  Optimal  Inference,  American 
Mathematical  Society,  Providence,  Rhode  Island,  1982. 

[48]  B.  Efron,  Defining  the  curvature  of  a  statistical  problem  (with  applications  to 
second  order  efficiency.  The  Annals  of  Statistics  3  (6)  (1975)  1189-1242. 

[49]  S.  Amari,  Differential-Geometrical  Methods  of  Statistics  (Lecture  Notes  in  Statis¬ 
tics),  Springer,  Berlin,  Germany,  1985. 

[50]  H.  Nagaoka,  S.  Amari,  Differential  geometry  of  smooth  families  of  probability 
distributions,  METR,  University  of  Tokyo,  Japan,  1982. 

[51]  R.  E.  Kass,  P.  W.  Vos,  Geometrical  Foundations  of  Asymptotic  Inference,  Wiley, 
New  York,  1997. 

[52]  S.  Amari,  M.  Kawanabe,  Information  geometry  of  estimating  functions  in  semi 
parametric  statistical  models,  Bernoulli  3  (1997)  29-54. 

[53]  S.  Amari,  K.  Kurata,  H.  Nagaoka,  Information  geometry  of  Boltzmann  machines, 
IEEE  Transactions  on  Neural  Networks  3  (2)  (1992)  260-271. 

[54]  S.  Amari,  Information  geometry  of  the  EM  and  em  algorithms  for  neural  net¬ 
works,  Neural  Networks  8  (9)  (1995)  1379-1408. 

[55]  S.  Amari,  Natural  gradient  works  efficiently  in  learning.  Neural  Computation 
10  (2)  (1998)  251-276. 

[56]  S.  Amari,  Fisher  information  under  restriction  of  Shannon  information.  Annals  of 
the  Institute  of  Statistical  Mathematics  41  (4)  (1989)  623-648. 


119 


[57]  L.  L.  Campbell,  The  relation  between  information  theory  and  the  differential  ge¬ 
ometry  approach  to  statistics.  Information  Sciences  35  (3)  (1985)  199-210. 

[58]  S.  Amari,  Differential  geometry  of  a  parametric  family  of  invertible  linear 
systems-Riemannian  metric,  dual  affine  connections  and  divergence.  Mathemat¬ 
ical  Systems  Theory  20  (1)  (1987)  53-82. 

[59]  A.  Ohara,  N.  Suda,  S.  Amari,  Dualistic  differential  geometry  of  positive  definite 
matrices  and  its  applications  to  related  problems.  Linear  Algebra  and  Its  Applica¬ 
tions  247  (1996)  31-53. 

[60]  A.  Ohara,  Information  geometric  analysis  of  an  interior  point  method  for  semidef- 
inite  programming.  Geometry  in  Present  Day  Science  (1999)  49-74. 

[61]  S.  Amari,  S.  Ikeda,  H.  Shimokawa,  Information  geometry  of  a -projection  in 
mean-field  approximation,  in  Recent  Developments  of  Mean  Field  Approximation, 
M.  Opper,  D.  Saad,  Eds.,  MIT  Press,  Cambridge  2000. 

[62]  T.  Tanaka,  Information  geometry  of  mean  field  approximation.  Neural  Computa¬ 
tion  12  (2)  (2000)  1951-1968. 

[63]  S.  Amari,  T.  S.  Han,  Statistical  inference  under  multiterminal  rate  restrictions:  a 
differential  geometric  approach,  IEEE  Transactions  on  Information  Theory  35  (2) 
(1989)  217-227. 

[64]  S.  Amari,  Information  geometry  on  hierarchy  of  probability  distributions,  IEEE 
Transactions  on  Information  Theory  47  (5)  (2001)  1701-1711. 

[65]  S.  T.  Smith,  Covariance,  subspace,  and  intrinsic  Cramer-Rao  bounds,  IEEE  Trans¬ 
actions  on  Signal  Processing  53  (5)  (2005)  1610-1630. 

[66]  Y.  Cheng,  X.  Wang,  B.  Moran,  Sensor  network  performance  evaluation  in  statisti¬ 
cal  manifolds,  in:  Proceedings  of  the  13th  International  Conference  on  Information 
Fusion,  Edinburgh,  Scotland,  2010. 

[67]  X.  Wang,  Y.  Cheng,  B.  Moran,  Bearings-only  tracking  analysis  via  information 
geometry,  in:  Proceedings  of  the  13th  International  Conference  on  Information 
Fusion,  Edinburgh,  Scotland,  2010. 

[68]  X.  Wang,  B.  Moran,  Multitarget  tracking  using  virtual  measurement  of  binary 
sensor  networks,  in:  Proceedings  of  the  9th  International  Conference  on  Informa¬ 
tion  Fusion,  Florence,  Italy,  2006. 

[69]  B.  Efron,  The  geometry  of  exponential  families.  The  Annals  of  Statistics  6  (2) 
(1978)  362-376. 

[70]  Y.  Bar-Shalom,  T.  E.  Fortmann,  Tracking  and  Data  Association,  Academic  Press, 
New  York,  1988. 

[71]  N.  H.  Abdel- All,  H.  N.  Abd-Ellah,  H.  M.  Moustafa,  Information  geometry  and 
statistical  manifold.  Chaos,  Solitons  and  Fractals  15  (1)  (2003)  161-172. 

[72]  F.  Barbaresco,  Innovative  tools  for  radar  signal  processing  based  on  Cartan’s  ge¬ 
ometry  of  SPD  matrices  &  information  geometry,  in:  2008  IEEE  Radar  Confer¬ 
ence,  Rome,  Italy,  2008. 


120 


[73]  M.  L.  Menendez,  D.  Morales,  L.  Pardo,  M.  Salicrij,  Statistical  tests  based  on 
geodesic  distances.  Applied  Mathematics  Letters  8  (1)  (1995)  65-69. 

[74]  K.  M.  Carter,  R.  Raich,  W.  G.  Finn,  A.  O.  Hero,  FINE:  Fisher  information  non- 
parametric  embedding,  IEEE  Transactions  on  Pattern  Analysis  and  Machine  Intel¬ 
ligence  31  (11)  (2009)  2093-2098. 

[75]  V.  Arnold,  Sur  la  geometrie  differentielle  des  groupes  de  Lie  de  dimension  infinie 
et  ses  applications  a  Fhydrodynamique  des  fluides  parfaits.  (French),  Annales  De 
L’lnstitut  Fourier  (Grenoble)  16  (1)  (1966)  319-361. 

[76]  P.  Petersen,  Riemannian  geometry.  Springer,  New  York,  1998. 

[77]  Z.  Yang,  J.  Laaksonen,  Principal  whitened  gradient  for  information  geometry. 
Neural  Networks  21  (2-3)  (2008)  232-240. 

[78]  S.  Kobayashi,  K.  Nomizu,  Foundations  of  differential  geometry,  Wiley- 
Interscience,  New  York,  1996. 

[79]  L.  Astola,  L.  Florack,  Sticky  vector  fields,  and  other  geometric  measures  on  dif¬ 
fusion  tensor  images,  in:  Proceedings  of  2008  IEEE  Computer  Society  Conference 
on  Computer  Vision  and  Pattern  Recognition  Workshops,  Anchorage,  AK,  USA, 
2008. 

[80]  Y.  Ollivier,  Ricci  curvature  of  Markov  chains  on  metric  spaces.  Journal  of  Func¬ 
tional  Analysis,  256  (2009)  810-864. 

[81]  A.  Gray,  The  volume  of  a  small  geodesic  ball  of  a  Riemannian  manifold,  Michi¬ 
gan  Mathematical  Journal  20  (4)  (1974)  329-344. 

[82]  X.  Pennec,  Intrinsic  statistics  on  Riemannian  manifolds:  Basic  tools  for  geometric 
measurements.  Journal  of  Mathematical  Imaging  and  Vision  25  (2006)  127-154. 

[83]  L.  Pronzato,  E.  Walter,  Robust  experiment  design  via  stochastic  approximation. 
Mathematical  Biosciences  75  (1)  (1985)  103-120. 

[84]  S.  Amari,  Differential  geometry  of  curved  exponential  families-curvatures  and 
information  loss.  The  Annals  of  Statistics  10  (2)  (1982)  357-385. 

[85]  A.  P.  Dawid,  Discussions  to  Efron’s  paper.  The  Annals  of  Statistics  3  (6)  (1975) 
1231-1234. 

[86]  L.  T.  Madsen,  The  geometry  of  statistical  model-a  generalization  of  curvature. 
Technical  Report,  Danish  Medical  Research  Council,  1979. 

[87]  C.  T.  J.  Dodson,  H.  Matsuzoe,  An  affine  embedding  of  the  gamma  manifold. 
Applied  Sciences  5  (1)  (2003)  7-12. 

[88]  B.  O.  Koopman,  On  distribution  admitting  a  sufficient  statistic.  Transactions  of 
the  American  Mathematical  Society  39  (3)  (1936)  399-409. 

[89]  K.  Arwini,  L.  D.  Riego,  C.  T.  J.  Dodson,  Universal  connection  and  curvature  for 
statistical  manifold  genometry,  Houston  Journal  of  Mathematics  33  (2007)  145- 
161. 


121 


[90]  K.  Arwini,  C.  T.  J.  Dodson,  Information  Geometry:  Near  Randomness  and  Near 
Independence,  Springer,  Berlin,  Germany,  2008. 

[91]  M.  J.  Wainwright,  M.  I.  Jordan,  Graphical  Models,  Exponential  Families,  and 
Variational  Inference,  Now  Publishers  Inc,  Hanover,  USA,  2008. 

[92]  F.  Nielsen,  R.  Nock,  The  entropic  centers  of  multivariate  normal  distributions, 
in:  Proceedings  of  European  Workshop  on  Computational  Geometry  (EuroCG), 
France,  2008,  pp.  221-224. 


122 


